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there is no need to rewrite them in independent form, because, for the reasons noted infra, the 
claims from which they depend are patentable. 

The objection to claim 18 is respectfully traversed in view of the cancellation 

of this claim. 

The rejection of claims 9, 1 1, 18-20, 23, 27, 29, 34, 38, 40, 45, 49, 51, 56, and 
63 under 35 U.S.C. § 112, first paragraph, for lack of enablement is respectfully traversed in 
view of the above amendments. 

The rejection of claims 69 and 70 under 35 U.S.C. § 1 12, first paragraph, for 
lack of enablement is obviated in view of the cancellation of these claims. 

The rejection of claims 5-7, 9, 12, 13, 25, 30, 31, 36, 41, 42, 47, 52, 53, 59, 64, 
and 65 under 35 U.S.C. § 112, first paragraph, for lack of written description is respectfully 
traversed in view of the above amendments. Claims 9, 25, 30, 31, 36, 41, 42, 47, 52, and 53 
have been canceled. Further, applicant has limited independent claims 5 and 59 to "[a]n 
isolated DNA molecule encoding a protein subunit of polymerase III holoenzyme from a 
eubacterial prokaryote" and independent claims 14 and 54 "[a]n isolated protein subunit of 
polymerase III holoenzyme from a eubacterial prokaryote". The disclosure of the DNA 
molecule encoding the 8 ! and 5 subunits for E. coli in the present application is a 
representative DNA species of the protein subunit of polymerase III holoenzyme from a 
eubacterial prokaryote and, therefore, applicant has provided written description for the 
independent claims and any claims which depend therefrom. 

It is well known to one skilled in the art that proteins homologous to the 8' 
subunit of the E. coli polymerase III holoenzyme are contained in organisms other than 
E. coli, as shown in the Declaration of Michael O'Donnell under 37 CFR § 1.132 submitted 
in parent U.S. Patent Application Serial No. 08/279,058 on December 17, 1996 ("Supp. 
O'Donnell Declaration") and the Supplemental Declaration of Michael O'Donnell under 37 
CFR § 1.132 ("Supp. O'Donnell Declaration") (submitted herewith). 

Those skilled in the art recognize the 5' subunit from E. coli has sequence 
homology to accessory protein complexes of various other organisms (O'Donnell Declaration 
^ 13). For example, in O'Donnell et al., "Homology in Accessory Proteins of Replicative 
Polymerases - E. coli to Humans," Nucleic Acids Research 21(l):l-3 (1993), a comparison of 
amino acid sequences shows the homology between proteins of replicative polymerases of E. 
coli, humans, and phage T4 (Id.). In Carter et al., "Identification, Isolation, and 
Characterization of the Structural Gene Encoding the 5' Subunit of Escherichia coli DNA 
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Polymerase III Holoenzyme," J. of Bacteriology . 1 75(1 2):38 12-22 (1993), Figure 5 diagrams 
the homology of the 8' amino acid sequence to other replication proteins (Id.). Comparison 
of the 8 f amino acid sequence revealed similarity to the A 1 (replication factor C) complex of 
HeLa cells and to the gene 44 protein (gp44) of bacteriophage T4 (Id. ). In addition, amino 
acid sequence similarity was found to the gene product of B. subtilis (Id.) . Further, the 
structural homology of the 8 f subunit to other replication proteins has been proven to be true 
(Id.). For example, the genome project of Haemophilus influenzae showed homologues to all 
10 subunits of E. coli DNA polymerase III holoenzyme, including 8, S\ %, ^ and 0 (Id. ). 
Currently, the GenBank now also shows homologues to the 8' subunit of E. coli from a large 
variety of organisms, including the following prokaryotes: Escherichia coli, Haemophilus 
influenzae, Micrococcus luteus, Pseudomonas aeruginosa, Bacillus subtilis, and Caulobacter 
crescentus (Id.). 

As to the 8 and 8' subunits of polymerase III holoenzyme, it is well known to 
one skilled in the art that proteins in other organisms have functional and structural homology 
to the subunits of E. coli (O'Donnell Declaration fflf 10-16). 

Various genome projects for many different organisms have resulted in the 
gene sequences for various bacteria being publicly available on various web sites (Supp. 
O'Donnell Decl. f 5). As described more fully below, the amino acid sequences for the 8 and 
8 r subunits for E. coli, disclosed in the present application, were used, by myself and others, 
in a BLAST search program (Altschul, et al., "Basic Local Alignment Search Tool," J. Mol. 
Bio. 215:403-10 (1990)) to identify the presence of genes encoding these proteins in other 
eubacterial prokaryotes (Id.) . As explained in the textbook Molecular Biology of the Gene 
(attached to the Supp. O'Donnell Decl. at Appendix A), eubacterial (i.e. true bacteria) 
prokaryotes are a distinct kingdom separate from eukaryotes and archaebacteria and include: 
Aquificales (included Aquifex aeolicus), Chlamydiales, Coprothermobacter, Cyanobacteria, 
Green Sulfur bacteria (includes Porphyromonal gingivalis and Chlorobium tepidum), 
Fibrobacter group, Firmicutes (Gram positives including Mycobacterium, Clostridium 
acetobutylicum, Streptococcus pneumoniae, Streptococcus pyogenes, Staphylococcus aureus, 
Bacillus subtilis), Flexistipes group, Fusobacteria, Green non-sulfur bacteria, Holophaga 
group, Nitrospira group, Planctomycetales, Proteobacteria (includes the alpha subdivision 
(e.g. Caulobacter crescentus), the beta group (e.g. Bordetella pertussis and Neisseria 
meningitidis), the delta/epsilon subdivisions (e.g. Campylobacter jejuni and Helicobacter 
pylori), and the gamma subdivision (e.g. the Enterobacteriaceae that includes Haemophilus 
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influenzai, Yersinia pestis, Vibrio choleraic Escherichia coli, Pasturella multocida, 
Pseudomonas aeruginosa, Salmonells typhi, Shewanella putrefaciens), Spirochaetales 
(includes Borrelia burgdorgeri, Treponema palladum), Synergistes group, 
Thermodesulfobacterium group, Thermotogales (included Thermotoga maritima), 
Thermus/Deinococcus group (included Thermus thermophilis and Deinococcus radiodurans), 
and a variety of as yet unclassified bacteria. The results of these analyses are set forth below 
04). 

The sequence analysis of Haemophilus influenze is found at 
http://www.tigr.org/tdb/mdb/hidb/hidb.html (Supp. O'Donnell Decl. Tf6). A copy of that web 
site listing is attached to the Supp. O'Donnell Decl. at Appendix B with the 5 subunit 
encoding gene being identified as HI0923 and the 8' subunit encoding DNA molecule being 
identified as HI0455 (Id.). This listing shows that the 8 subunit encoding DNA molecule of 
Haemophilus influenze is 62.0% similar to the 8 subunit encoding DNA molecule of E. coli 
(Id.). Likewise, the 8* subunit of Haemophilus influenze is shown to be 57.4% similar to the 
8' subunit encoding DNA molecule of E. coli (Id.) . 

The genome of Niceria gonorrhoeae is found at the web site 
http://www.genome.on.edu (Supp. O'Donnell Decl. f 7). Search for the 8 subunit amino acid 
sequence yields a contig. with a very high probability of 1.2 x 10~ 25 , contig. 188, while the 8' 
amino acid sequence yields a contig. of high probability of 1.2 x 10" 14 #200 (Id.). See 
Appendix C attached to the Supp. O'Donnell Decl. 

The genome for Shewanella putrefaciens is found on the TIGR BLAST server 
(Supp. O'Donnell Decl. 1J8). A search for the 8 subunit produced the high score of 1 .1 x 10" 54 
for contig. gsp 230, while the search for 8' subunit produced the high score of 6.4 x 10" 27 for 
contig. gsp 271 (Id. ). See Appendix D attached to the Supp. O'Donnell Decl. 

The genome for Vibrio cholerae is found at http://www.tigr.org/cgi- 
bin/BlastSearch/blast.cgi?organism=v.cholerae (Supp. O'Donnell Decl. %9). A search for the 
8 subunit produced the high score of 6.9 x 10" 82 for contig. asm 937, while the search for 8' 
subunit produced the high score of 8.1 x 10" 37 for contig. asm 894 (Id.). See Appendix E 
attached to the Supp. O'Donnell Decl. 

The genomes for Pseudomonas aeruginosa (see Appendix F attached to the 
Supp. O'Donnell Decl.), Salmonella typhi (see Appendix G attached to the Supp. O'Donnell 
Decl.), and Yersinia pestis (see Appendix H attached to the Supp. O'Donnell Decl.) are found 
at http://www.ncbi.nlm.nih.gov/Blast/unfinished genomes (Supp. O'Donnell Decl. 110). For 
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these, the amino acid sequences of E. coli 8 and 8' were used in BLAST searches (Id. ). The 
high scores, given below, are all sufficiently significant to call the identified gene the one that 
performs the homologous function in E. coli (Id.): 



Pseudomonas aeruginosa 

S-7x 10" 34 contig. 52 
8' - 9x 10 -27 contig. 50 

Salmonella typhi 

8 - 1 x 10" 161 contig. 1564 
8' -8x 10" 10 contig. 870 

Yersinia pestis 

8 - 1 x 10" 127 contig. 803 
8' -9x 10 -98 contig. 51 



Thus, for Gram negative bacteria such as Haemophilus influenze, Niceria 



gonorrhoeae, Shewanella putrefaciens, Vibrio cholerae, Pseudomonas aeruginosa, 
Salmonella typhi, and Yersinia pestis, there is a high level of homology between the 8 and 8' 
subunits of those bacteria and the 8 and 8' subunits of E. coli (Supp. O'Donnell Decl. 1J1 1). 



For other eubacteria, there is significant homology between their 8 f subunit 



and that of E. coli (Supp. O'Donnell DecL If 12). In all eubacteria, the 8 subunit can be 
identified starting with the E. coli 8 subunit as comparison, but, since it is not as conserved as 
the 8' subunit, one must "walk" from one organism to another, as discussed in ^ 23 below 



In Himmelreich et al., "Complete Sequence Analysis of the Genome of the 



Bacterium Mycoplasma pneumoniae," Nucleic Acids Research 24(22):4420-4449 (1996), the 
8' subunit of Mycoplasma pneumoniae is identified as being homologous to the 8' subunit of 
E. coli in Table 1 on page 4426 (Supp. O'Donnell DecL TJ1 3). See Appendix I attached to the 
Supp. O'Donnell Decl. 



In Kunst et al., "The Complete Genome Sequence of the Gram-positive 



Bacterium Bacillus subtilis" Nature 390:249-256 (1997), the 8' subunit of Bacillus subtilis is 
identified as being homologous to the 8' subunit of E. coli in the table on page 248 (Supp. 
O'Donnell Decl. TJ14). See Appendix J attached to the Supp. O'Donnell Decl. 

The genome for Streptococcus pyogenes is found in the University of 
Oklahoma server (i.e. http://www.ncbi.nlm.nih.gov.BLAST/tigrbl.html) (Supp. O'Donnell 



(Id). 
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Decl. If 15). 5' produced the high score of 3.3 x 10" 10 for contig. 218 (Id/). See Appendix K 
attached to the Supp. O'Donnell Decl. 

The genome for Enterococcus faecalis is found on the TIGR BLAST search 
server (Supp. O'Donnell Decl. fl6). 5' produced the high score of 9.6 x 10" 16 for contig. 
6277 (Id.). See Appendix L attached to the Supp. O'Donnell Decl. 

The genome for Streptococcus pneumoniae is found on the TIGR BLAST 
search server (Supp. O'Donnell Decl. 1[17). 5' produced the high score of 2.4 x 10" 12 for 
contig. sp 68 (Id.). See Appendix M attached to the Supp. O'Donnell Decl. 

The genome for Aquifex aeolicus is found in Deckert et al., "The Complete 
Genome of the Hyperthermophilic bacterium Aquifex aeolicus" Nature 392:353-358 (1998) 
and at http://www.ncbi.nlm.nih.gov/Blast/unfinished genomes (Supp. O'Donnell Decl. If 18). 
5' produced the high score of 8 x 10" 13 (position 1303996-1304394) (Id). See Appendix N 
attached to the Supp. O'Donnell Decl. 

The genome for Thermatoga maritima is found in the TIGR BLAST server 
page (Supp. O'Donnell Decl. 119). 5' yields a high score of 3.7 x 10' 15 for contig. tm 26 (Id.). 
See Appendix O attached to the Supp. O'Donnell Decl. 

In Spirochaetes, Tomb et al., "The Complete Genome Sequence of the Gastric 
Pathogen Helicobacter pylori" Nature 388:539-547 (1997) (see Appendix P attached to the 
Supp. O'Donnell Decl.) and Fraser et al., "Genomic Sequence of a Lyme Disease 
Spirochaete, Borrelia burgdorferi" Nature 390:580-586 (1997) (see Appendix Q attached to 
the Supp. O'Donnell Decl.), Helicobacter pylori and Borrelia burgdorferi are identified to 
have 5' subunits (Supp. O'Donnell Decl. 1J20). For Helicobacter pylori, 5' is listed in the 
table as HP 1231 (Id. ). For Borrelia burgdorferi, using the NCBI genome search page 
(Ncbi.nlm.nih.gov/Blast/unfinished genomes), 5' gives the high score of8xl0- 7 (ldj. See 
Appendix R attached to the Supp. O'Donnell Decl. 

In Andersson et al., "The Genome Sequence of Rickettsia prowazekii and the 
Origin of Mitochondria," Nature 396:133-140 (1998), Rickettsia prowazekii is identified to 
have a 8' subunit, identified as RP172 (Supp. O'Donnell Decl. 121). See Appendix S 
attached to the Supp. O'Donnell Decl. 

A large compilation of genome sequences is at the web site 
http://www.ncbi.nlm.gov/Blast/unfinished genome.html (Supp. O'Donnell Decl. 122). The 
eubacterial genomes were searched using the 5' subunit of E. coli (Id.). All organisms in 
eubacteria scored very high with identity levels sufficient to identify the holB gene encoding 
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5* conclusively ( Id.) . This is seen in Figure 1 showing a path of one-on-one comparative 
alignments each of which start with E. coli and the alignments (Id.) (Appendix T attached to 
the Supp. O'Donnell DecL). In this figure, within the parentheses, is the percent identity and 
the ratio of the number of identities (i.e. the numerator) over the length of the amino acid 
sequence that was compared (i.e. the denominator) (Id.) . The number outside of the 
parentheses is the score obtained in the Blast program (i.e. even a score of 1 x 10" 9 is a 
sufficiently high score to identify the homologous gene) (Id.). 

A similar search with the 8 subunit of E. coli identified the hoi A gene of 
Nisseria and Thiobacillus as high matches, and holA of other enteric bacteria produced high 
scores as well (Supp. O'Donnell DecL T|23). Repetition of this procedure using Neisseria 8 
easily allows the identification of 8 in Aquifex aeolicus (Id.) . Use of Aquifex aeolicus 8 
identifies 8 of Enterococcus (which identifies Bacillus 8, then Streptococcus 8, then 
Synechocystis, and the Porphyromonas 8) (Id.) . Use of Aquifex aeolicus 8 also identifies 
Thermatoga 8, which identifies Spirochaetes {Borrelid) 8 subunit (Id. ). Use of Thiobacillus 8 
identifies 8 from Helicobacter camylobacter (Id. ). There is a region at about 100 residues 
that is rather well conserved in 8 across eubacteria and if this were used, the scores could be 
even higher yet (Id.) . Figure 2 shows this "walking" procedure and shows the scores and 
percent identities obtained as a result of this procedure starting from the 8 subunit of E. coli 
as well as alignments (Id. ). This figure is substantially the same as Figure 1 but within the 
parentheses, after the percentage identity, there is another ratio and another percentage based 
on homologies (Id. ). Figure 2 does not show scores for individual Gram negative bacteria of 
the Enterobacteria class (called enterics) as they are highly related to E. coli and the scores 
are very high (Id.). 

Therefore, those of ordinary skill in the art, using the sequence information in 
the present application, would have been able to (and, in fact, did) identify and isolate the 8 
and 8* subunits of polymerase III holoenzyme (and their encoding genes) from eubacteria 
other than E. coli (See Supp. O'Donnell Declaration f 24). 

Further, the sequence of the eubacterial homologues to 8\ and indeed the other 
8' homologues, are sufficiently homologous to the 8' subunit of E. coli to provide for 
identifying and obtaining the corresponding 8 f (holA) gene from these organisms using the 
gene encoding the 8' subunit of E. coli in the following ways: (1) use of the E. coli holA 
gene, or fragments of the E, coli gene, as a probe in a Southern analysis of whole cell DNA 
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from another organisms to identify the corresponding 8 1 homologue; (2) use of holA 5 or its 
fragments, as a probe to screen cDNA plasmid libraries of other organisms; (3) use of the 
holA gene sequence to synthesize oligonucleotide primers for PCR to amplify the 
corresponding 8' homologue from total genomic DNA from other organisms; and (4) use of 
the holA gene sequence to identify the 8' homologue from a genome sequencing project of 
other organisms by sequence comparison to the E. coli holA gene (O'Donnell Declaration 



and 8 subunits and their encoding genes for the polymerase III holoenzyme. In view of the 
disclosure of these experimental procedures, the known structural and functional homology 
of the 8' and 8 subunits proteins from various sources such as numerous other prokaryotes, 
and the present amendment limiting claims 5, 14, 54, and 59 to an isolated DNA molecule 
encoding a protein subunit of polymerase III holoenzyme from a eubacterial prokaryote and 
an isolated protein subunit of polymerase III holoenzyme from a eubacterial prokaryote, it 
would not require an undue amount of experimentation for one skilled in the art to isolate and 
sequence the claimed 8' and 8 proteins (and their encoding gene) from eubacterial prokaryote 
sources other than E. coli. 



35 U.S.C. § 1 12, first paragraph, for lack of written description is respectfully traversed in 
view of the above remarks. In addition, claims 21, 24, 32, 35, 43, 46, and 66-75 have been 
canceled. 

The rejection of claims 66-75 under 35 U.S.C. § 1 12, second paragraph, for 
indefiniteness is respectfully traversed in view of the cancellation of these claims. 

The rejection of claims 32-35 under 35 U.S.C. § 102(b) as anticipated by 
Yoshikawa et al., "Cloning and Nucleotide Sequencing of the Genes rimland ritnJ which 
Encode Enzymes Acetylating Ribosomal Proteins SI 8 and S5 of Escherichia coli K12," Mol. 
Gen. Genet. , 209:471-488 (1987) ("Yoshikawa") is respectfully traversed in view of the 
cancellation of these claims. 

The rejection of claims 36, 37, 39, 41, and 42 under 35 U.S.C. § 102(b) as 
anticipated by Yoshikawa is respectfully traversed in view of the cancellation of these claims. 

The rejection of claims 43, 45, and 46 under 35 U.S.C. § 102(b) as anticipated 
by Stirling et al.. "xerB, an Escherichia coli Gene Required for Plasmid ColEl Site-Specific 
Recombination, is Identical to pepA z Encoding Aminopeptidase A, a Protein with Substantial 



114). 



The present application fully discusses the isolation and sequencing of the 8 r 



The rejection of claims 14-16, 21, 24, 32, 35, 43, 46, 54, 57, and 66-75 under 
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Similarity to Bovine Lens Leucine Aminopeptidase," EMBO J. , 8:1623-1627 (1989) 
("Stirling") is respectfully traversed in view of the cancellation of these claims. 

The rejection of claims 47, 49, 50, 51, 52, and 53 under 35 U.S.C. § 102(b) as 
anticipated by Stirling is respectfully traversed in view of the cancellation of these claims. 

The rejection of claims 54 and 56-58 under 35 U.S.C. § 102(b) as anticipated 
by Takase et al., "Genes Encoding Two Lipoproteins in the leuS-dacA Region of the 
Escherichia coli Chromosome," J. Bac . 169:5692-5699 (1987) ("Takase") is respectfully 
traversed. 

Takase relates to the coding of two lipoproteins by two genes, rlpA and rlpB, 
located in the leuS-dacA region on the Escherichia coli chromosome (O'Donnell Declaration 
If 17). The rlpA gene encodes for a lipoprotein having molecular weight of 36K (Id). Figure 
6 of the reference details the sequence of the 36K lipoprotein gene rip A and its 5'- and 3'- 
flanking regions and the amino acid sequences deduced from the nucleotide sequence (Id.). 
The position of the PTO is that this sequence matches that of the sequence encoding the 
claimed 8 subunit. Applicants respectfully disagree. This sequence is not holA, it is rlpA 
and rlpB, the subject of Takase. At the end of the sequence, past both rplA and B, are 230 
base pairs that was not discussed (Id. ). This sequence encodes the first 20-25% of the holA 
gene sequence (Id. ). Takase did not recognize this to be an open reading frame of a putative 
unknown gene, nor did the reference disclose the complete sequence of the holA gene (Id. Y 
The diagram attached as Exhibit 9 to the O'Donnell Declaration shows the overlap between 
the disclosed rlpB gene of Takase and the hoi A gene encoding the claimed 8 subunit (Id.). 
Thus, the 8 protein subunit of polymerase III holoenzyme and the gene encoding the 8 protein 
subunit of the polymerase III holoenzyme of the present invention are not disclosed by 
Takase. 

In particular, Takase only discloses a portion of the holA gene encoding the 8 
protein subunit of polymerase III holoenzyme and does not disclose the nucleotide or protein 
sequences for the entire 8 subunit. 

In contrast, claim 54 relates to "[a]n isolated protein subunit of polymerase III 
holoenzyme from a eubacterial prokaryote, wherein the subunit group is 8." Further, claim 
59 relates to "[a]n isolated DNA molecule encoding a protein subunit of polymerase III 
holoenzyme from a eubacterial prokaryote, wherein the subunit group is 8." Takase does not 
teach the entire specified isolated protein subunits of polymerase III holoenzyme, nor the 
entire gene encoding that protein. Further, Takase does not disclose the claimed expression 



R293735.1 



Serial No. 08/828,323 




system or host cell. Since Takase does not disclose the entire 5 protein subunit nor the entire 
sequence encoding the 8 protein subunit, there is no basis for an anticipation rejection. 

The outstanding office action places great reliance on the results from use of 
the MPSearch sequence analysis software employing the Smith- Waterman algorithm. 
Applicant has not been provided with this analysis or algorithm and, therefore, has great 
difficulty responding to this aspect of the outstanding office action. In any event, however, it 
is beyond dispute that Takase fails to disclose the complete sequences for the 5 subunit. The 
MPSearch sequence analysis and Smith- Waterman algorithm are thus contrary to fact, as 
demonstrated by the O'Donnell Declaration and Takase itself. Since there is no reasonable 
basis for the rejection over Takase, that rejection must be withdrawn. 

The rejection of claims 59, 60, 64, and 65 under 35 U.S.C. § 102(b) as 
anticipated by Takase is respectfully traversed in view of the remarks in the preceding 
paragraphs. 

The rejection of claims 32-35 under 35 U.S.C. § 103(a) as being unpatentable 
over Yoshikawa is respectfully traversed in view of the cancellation of these claims. 

The rejection of claims 41 and 42 under 35 U.S.C. § 103(a) as being 
unpatentable over Yoshikawa is respectfully traversed in view of the cancellation of these 
claims. 

The rejection of claims 43-46 under 35 U.S.C. § 103(a) as being unpatentable 
over Stirling is respectfully traversed in view of the cancellation of these claims. 

The rejection of claims 52 and 53 under 35 U.S.C. § 103(a) as being 
unpatentable over Stirling is respectfully traversed in view of the cancellation of these claims. 

The rejection of claims 54-58 under 35 U.S.C. § 103(a) as being unpatentable 
over Takase is respectfully traversed. 

As stated above, Takase does not disclose the entire specified isolated 8 
protein subunit of polymerase III holoenzyme, nor the entire gene encoding that protein. In 
particular, Takase discloses only a short portion of the gene encoding the 8 protein subunit. 
In addition, Takase provides no motivation to determine the sequence of the remainder of the 
gene. Specifically, Takase failed to identify the open reading frame of the gene for the 8 
protein subunit of polymerase III holoenzyme and, therefore, provides no motivation or 
suggestion to determine the remainder of the gene encoding the 8 protein subunit. Further, 
the focus of Takase is on two genes, rlpA and rlpB, which are different from the gene 
encoding the 8 protein subunit of polymerase III holoenzyme. As a result, Takase provides 
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no motivation with respect to determining the sequence of the gene encoding the 8 protein 
subunit of polymerase III holoenzyme. Therefore, the rejection based on this reference is 
improper and should be withdrawn. 

The rejection of claims 5, 7, 8, 10, 12, 13, 17, 21, 22, 25, 26, 28, 32, 36, 37, 
39, 41, 43, 47, 48, 50, 53, 55, 60, 62, and 65 under 35 U.S.C. § 101 as claiming the same 
invention as that of prior U.S. Patent Application Serial No. 08/279,058, now U.S. Patent No. 
5,668,004 to O'Donnell ("the '004 Patent") is respectfully traversed. 

Claims 21, 22, 25, 26, 28, 32, 36, 37, 39, 41, 43, 47, 48, 50, and 53 have been 
canceled. Further, the claims of the '004 Patent are limited to subunits of DNA polymerase 
III from Escherichia coli. In contrast, the claims of the present application are not limited to 
E. coli. Since the scope of the claims in the '004 Patent and the present application are not 
identical, the rejection for same invention type double patenting is improper and should be 
withdrawn. 

The rejection of claims 9, 13, 17, 21, 23, 32, 38, 41, 43, 47, 49, 53, 63, and 65 
for obviousness-type double patenting as being unpatentable over the '004 Patent, to the 
extent those claims remain in the present application, is respectfully traversed in view of the 
terminal disclaimer filed herewith. 

In view of all the foregoing, it is submitted that this case is in condition for 
allowance and such allowance is earnestly solicited. 



Date: 



Respectfully submitted, 

Michael L. Goldman 
Registration No. 30,727 
Attorney for Applicant 




Nixon, Hargrave, Devans & Doyle LLP 
Clinton Square, P. O. Box 1051 
Rochester, New York 14603 
Telephone: (716) 263-1304 
Facsimile: (716) 263-1600 
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S1TPPT FMFNTAL DFft AKATION OF 
MTf TTAF.I, O'DON NEIJ TINDER 37 C FR S 1.132 

Assistant Commissioner for Patents 
Washington, DC. 20231 

Dear Sir: 

I, MICHAEL O'DONNELL, pursuant to 37 CFR § 1-132, declare: 

1 . I received a B.S. degree in Biochemistry from the University of Portland, 
Portland, Oregon in 1975 and a Ph D. degree in Biochemistry from the University of 
Michigan, Ann Arbor, Michigan in 1982. I was a Postdoctoral Fellow at Stanford University, 
Stanford, California from 1982 to 1986. 

2. I am a Professor, Rockefeller University, New York, New York. In addition, 1 
am an Investigator with Howard Hughes Medical Institute, Chevy Chase, Maryland. 

3. I am the sole named inventor of the above-identified application. 

4. I present this declaration to show (1) that proteins homologous to the 8' and 8 
subunits of DNA polymerase III holoenzyme are contained in eubacterial prokaryotes other 
than E coli and (2) that, using the sequence information for the E coli 8' and 8 subunits in 
my present application, those skilled in the art could obtain these subunits from other 
eubacterial prokaryotes and, in fact, have done so. 

5. Various genome projects for many different organisms have resulted in the 
gene sequences for various bacteria being publicly available on various web sites. As 
described more fully below, the amino acid sequences for the 8 and 5' subunits for £ coli, 
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disclosed in my present application, were used, by myself and others, in a BLAST search 
program (Aluchul, et al., "Basic Local Alignment Search Tool," J.MoLBio, 215:403-10 
(1 990)) to identify the presence of genes encoding these proteins in other eubacterial 
prokaryotes. As explained in the textbook Miliar Biology Oftfo Gens (attached hereto as 
Appendix A), eubacterial (i.e. true bacteria) prokaryotes are a distinct kingdom separate from 
eukaryotes and archaebacteria and include: Aquificales (included Aquifex aeolicus), 
Chlamydiales, Coprothermobacter, Cyanobacteria, Green Sulfur bacteria (includes 
Porphyromonal gingival* and Chlorobium tepidum), Fibrobacter group, Firmicutes (Gram 
positives including Mycobacterium, Clostridium acetobutylicum, Streptococcus pneumoniae, 
Streptococcus pyogenes, Staphylococcus aureus, Bacillus subtilis), Flexistipes group, 
Fusobacteria, Green non-sulfur bacteria, Holophaga group, Nitrospira group, 
Planctomycetales, Proteobacteria (includes the alpha subdivision (e.g. Caulobacter 
crescentus), the beta group (e.g. Bordetella pertussis and Neisseria meningitidis), the 
delta/epsilon subdivisions (e.g. Campylobacter jejuni and Helicobacter pylori), and the 
gamma subdivision (e.g. the Enterobacteriaceae that includes Haemophilus influenzal, 
Yersinia pestis, Vibrio cholerai, Escherichia coli, Pasturella multocida, Pseudomonas 
aeruginosa, Salmonells typhi, Shewanella putrefaciens), Spirochae tales (includes Borrelia 
burgdorgeri, Treponema palladum), Synergistes group, Thermodesulfobacterium group, 
Thermotogales (included Thermologa maritima), Thermus/Deinococcus group (included 
Thermits thermophilis and Deinococcus radiodurans), and a variety of as yet unclassified 
bacteria. The results of these analyses are set forth below. 

6. The sequence analysis of Haemophilus influenze is found at 

http:/ Wv.tigr.org/tdb/mdb/hidb/hidb.htral. A copy of that web site listing is attached at 
Appendix B with the 5 subunit encoding gene being identified as H10923 and the 5' subunit 
encoding DNA molecule being identified as ffl0455. This listing shows that the 8 subunit 
encoding DNA molecule of Haemophilus influenze is 62.0% similar to the 8 subunit 
encoding DNA molecule of E. coli. Likewise, the 5' subunit of Haemophilus influenze is 
shown to be 57.4% similar to the 5' subunit encoding DNA molecule of E. coli. 

7 . The genome of Nicer ia gonorrhoeae is found at the web site 
http://www.genome.on.edu. Search for the 8 subunit amino acid sequence yields a contig. 
with a very high probability of 1 .2 x 10" 25 , contig. 188, while the 8' amino acid sequence 
yields a contig. of high probability of 1 .2 x 10' 14 #200. See Appendix C. 
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8. The genome for Shewanella pvtrefaciens is found on the T1GR BLAST 
server. A search for the 6 subunit produced die high score of U x 10" 54 for contig. gsp 230, 
while the search for 8' subunit produced the high score of 6.4 x 10" 27 for contig. gsp 271 . See 
Appendix D. 

9. The genome for Vibrio cholerae is found at http://www.tigr.org/cgi- 
bin^BlastSearchA)last.cgi?orgaiiism=v.cholerae. A search for the 5 subunit produced the high 
score of 6.9 x 10" M for contig. asm 937, while the search for 5' subunit produced the high 
score of 8.1 x 10' 37 for contig. asm 894. See Appendix E. 

10. The genomes for Pseudomonas aeruginosa (sec Appendix F), Salmonella 
typhi (see Appendix G), and Yersinia pestis (see Appendix H) are found at 
http://www.ncbi.rdm.nih.gov/Blast/unfinished genomes. For these, the amino acid sequence 
of E coli 5 and 6" were used in BLAST searches. The high scores, given below, are all 
sufficiently significant to call the identified gene the one that performs the homologous 
function in E. coli: 

Pseudomonas aeruginosa 

8 - 7x 10 -34 contig. 52 
6' - 9 x 10" 27 contig. 50 

Salmonella typhi 

6 - 1 x 10" 161 contig. 1564 
S'-8xl0- 10 contig. 870 

Yersinia pestis 

8 - 1 x 10" 127 contig. 803 
8'-9x 10" 98 contig. 51 

1 1 . Thus, for Gram negative bacteria such as Haemophilus influenze, Niceria 
gonorrhoeae, Shewanella pvtrefaciens, Vibrio cholerae, Pseudomonas aeruginosa, 
Salmonella typhi, and Yersinia pestis, there is a high level of homology between the 8 and 8 1 
subunits of those bacteria and the 8 and 8' subunits of £ coli. 

12. For other eubacteria, there is significant homology between their 8' subunit 
and that of E. coli. In all eubacteria, the 8 subunit can be identified starting with the E. coli 8 
subunit as comparison, but, since it is not as conserved as the 8' subunit, one must "walk" 
from one organism to another, as discussed in \ 23 below. 

13. In Himmelreich et al., "Complete Sequence Analysis of the Genome of the 
Bacterium Mycoplasma pneumoniae," Nnr.leic Acids Research 24(22):4420-4449 (1996), the 
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8' subunit of Mycoplasma pneumoniae is identified as being homologous to the 5' subunit of 
E. coli in Table 1 on page 4426. See Appendix I. 

14. In Kunst et al., "The Complete Genome Sequence of the Gram-positive 
Bacterium Bacillus subtilis," Mature 390:249-256 (1997), the 8' subunit of Bacillus subtilis is 
identified as being homologous to the 6' subunit of £ coli in the table on page 248. See 
Appendix J. 

15. The genome for Streptococcus pyogenes is found in the University of 
Oklahoma server (i.e. h ttP :/^w.ncbi.nlm.nih.gov.BLAST/tigrbl.html). 5' produced the 
high score of 3.3 x 10" 10 for contig. 218. See Appendix K. 

16. The genome for Enter ococcus faecalis is found on the TIGR BLAST search 
server. 8' produced the high score of 9.6 x Iff" for contig. 6277. See Appendix L. 

17. The genome for Streptococcus pneumoniae is found on the TIGR BLAST 
search server. V produced the high score of 2.4 x 10 12 for contig. sp 68. See Appendix M. 

18. The genome for Aquifix aeolicus is found in Deckert et al., "The Complete 
Genome of the Hyperthermophilic bacterium^/** «wW 392:353-358 (1998) 
and at http://vvvvw.ncbi.ntai.mh.gov/Blast/unfinished genomes. 5' produced the high score of 
8 x 10- 13 (position 1303996-1304394). See Appendix N. 

19. The genome for Thermatoga maritime is found in the TIGR BLAST server 
page. 8' yields a high score of3.7xl0" 15 for contig. tm 26. See Appendix O. 

20. In Spirochaetes, Tomb et al, "The Complete Genome Sequence of the Gastric 
Pathogen Helicobacter pylori? Najure 388:539-547 (1997) (see Appendix P) and Fraser et 
al., "Genomic Sequence of a Lyme Disease Spirochaete, Borrelia burgdorferi? £Jature 
390:580-586 (1997) (see Appendix Q), Helicobacter pylori and Borrelia burgdorferi are 
identified to have 6' subunits. For Helicobacter pylori, S 1 is listed in the table as HP1231. 
For Borrelia burgdorferi, using theNCBI genome search page 
(Ncbi.nlm.n\h.gov/Blast/unfinished genomes), 5' gives the high score of 8 x 10' 7 . See 

Appendix R. 

21. In Andersson et aL, "The Genome Sequence of Rickettsia prowazehii and the 
Origin of Mitochondria," Mature 396:133-140 (1998), Rickettsia pro^ekii is identified to 
have a 6' subunit, identified as RP172. See Appendix S. 

22. A large compilation of genome sequences is at the web site 
http://w.^.ncbi.nlm.gov/Blast/unfinished genome.html. The eubacterial genomes were 
seaxched using the 8" subunit of E coli . All organisms in eubacteria scored very high with 
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identity levels sufficient to identify the holB gene encoding 6' conclusively. This » seen » 
Fbure 1 showing a path of one-on-one comparative alignments each of which start with E. 
coli and the alignments (attached hereto as Appendix T). Ia this figure, within the paratbeses, 
is the percent identity and the ratio of the number of identities (i.e. the numerator) over the 
length of the amino acid sequenc e that was compared (i.e. the denominator). The number 
outside of the parentheses is the score obtained in the Blast program (i.e. even a score of 1 * 
10- 9 is a sufficiently neb. score to identify the homologous gene). 

23 A similar search with the 8 subunit off. coli identified the holA gene of 
Nisseria and niobacMus as high matches, and holA of other enteric bacteria produced high 
scores as well. Repetition of this procedure using Neissena 8 easily allows the identificaUon 
of 8 in Aquifex aeolicus. Use of Aguifex aeolicus 6 identifies 5 of Enterococcus (winch 
rdenufies Bacillus 8, then Streptococcus 6, then Syveckocystis, and the Porphyrons 8). 
Use of Aguifex aeolicus 8 also identifies Thermatoga 5, which identifies Spirochetes 
Vorrelia) 8 subunit. Use of Thiobaallus 8 identifies 8 from Helicobacter camylobacte, 
There is a region at about 100 residues that is rather well conserved in 8 across eubacteria and 
if this were used, the scores could be even higher yet. Figure 2 shows this "walkmg" 
procedure and shows the scores and percent identities obtained as a result of this procedure 
starting from the 8 subunit of* coli as well as alignments (attached hereto at Append* U). 
This figure is substantially the same as Ftgure 1 but within the parentheses, after the 
percentage identity, there is another ratio and another percentage based on homology 
Figure 2 does not show scores for individual Gram negative bacteria of the Enterobactena 
class (called enterics) as they are highly related to E. coli and the scores are very hrgh. 

24 As demonstrated by all the foregoing, those of ordinary skill in the art would 
have been able to (and, in fact, did) identify and isolate the 8 and 8" subunits of their 
polymerases (and the encoding genes) from eubacteria other than E. coli. 
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25. I hereby declare that all statements made herein of my own knowledge are true 
and that all statements made on information and belief are believed to be ttue; and further 
that these statements were made with the knowledge that willful false statements and the like 
so made are punishable by fine or imprisonment, or both, under section 1001 of Title 18 of 
the United States Code, and that such wilful false statements may jeopardize the validity of 
the application or any patent issuing thereon. 

Date._^_.£_^_/L.X Mjchad £ _ 0 , DonneU | 
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the Escherichia coli Chromosome," .T . Bacteriology , 

169 (12) -.5692-99 (1987) ("Takase") does not disclose the 6 

subunit of polymerase III holoenzyme . 

Maki Does NTot Di ^lnse Active Subunits 

5. I performed my postdoctoral studies with 
Dr Arthur Romberg at Stanford, and worked in the same 
laboratory as Hisaji Maki, the first listed author of the Maki 
reference. Accordingly, I am knowledgeable regarding the work 
discussed in Maki. Although Figure 4 on page 6551 of Maki 
shows bands identified as the subunits of Polymerase III 
holoenzyme, there was uncertainty in the Romberg laboratory 
as to the authenticity of the various bands. In particular, 
it was unclear whether or not these bands were true subunits. 
At the time, the only true and established subunits were the 
(3 y , T , ol, and e proteins, as their genes mapped to classic 
temperature sensitive mutant alleles of DNA replication. 
However, no other classic temperature sensitive mutants in 
replication were left that had not already been identified. 
Hence, the bands shown in Figure 4 labelled 6 , 6 ' , X , and 6 

may have been either protein contaminants that were still 
present in the holoenzyme preparation or proteolytic products 
of the larger subunits (e.g. a, t, T > • Indeed, most people in 
the field, did not believe that these protein bands were true 
subunits of the holoenzyme. 

6. Further, I am familiar with the procedures 
described - in Maki utilized to separate the subunits of the 
polymerase III holoenzyme. An important difference between 
Maki and my invention is that the proteins of my invention are 
purified without the use of denaturants . Maki discloses the 
use of a denaturant to separate the subunits, because they are 
tightly held into a particle of all ten proteins called the 
Pol III holoenzyme. Within this holoenzyme particle, there 
are 18 polypeptide chains, because some of the proteins are 
present in copies of two or more. Hence, to separate the 
subunits, Maki discloses the use of sodium dodecyl sulfate 
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("SDS") to denature the holoenzyme particle. SDS is one of 
the very most powerful protein denaturants, it completely 
unfolds polypeptide chains to form a rodlike SDS -polypeptide 
complex (Lehninger, A., Biochemistry, Worth Publishers, NY, 
NY, Third Edition, pp. 180 (1977)) (attached hereto as 
Exhibit 1) . Samples for the SDS-PAGE technique, such as used 
by Maki et al . , are typically boiled for 2-5 min. prior 
loading on the gel (See et al . , "Estimating Molecular Weights 
of Polypeptides by SDS Gel Electrophoresis , » In Protein 
.^.nieture: A Practical Approach , IRL Press, New York ed. T.E. 
Cr eight on, pp. 1-21 (1989) (attached hereto as Exhibit 2)). 
The use of high temperatures and SDS will cause complete 
denaturation of most proteins. Id. Only in some cases is it 
possible to renature the proteins from an SDS-PAGE, and, then, 
it is often only useful for performing immunoprecipitat ions 
(Scheidtmann, K.H., -Immunological Detection of Proteins of 
Known Sequence," In T>™t-.*i.n Structure : A Practical Approach , 
IRL press, New' York, ed. T.E. Creighton, pp. 93-115 (1989) 
(attached hereto as Exhibit 3)). The basis for the antibody 
recognition of proteins lacking correct 3D conformation and 
full biological activity is that most antibodies recognize the 
primary sequence of the protein rather than requiring & 
correct three dimensional structure. 

7. Once a protein is denatured in SDS, there is 
little hope of returning it to an active, or conf ormationally 
correct, form. I have tried this procedure with the 6, - 6', 
and t subunits without success in recovering activity. 
Generally," one must mince up the SDS gel, extract with a 
mortar and pestle, and remove the SDS using Dowex or acetone 
precipitation. Often, other denaturants such as urea and/or 
guanidine hydrochloride are used in the process. Guanidine 
hydrochloride and urea are polar molecules with no substantial 
aliphatic character and, therefore, can be efficiently 
dialyzed off a protein to permit renaturation in some cases. 
However, SDS has a large aliphatic component which binds 
tightly to protein and is difficult to remove completely, 
making renaturation unlikely. 
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8. In the examples of the present application, the 
subunits are all purified in the absence of denaturant and, 
accordingly, are conf ormationally correct throughout their 
purification. This is possible, because, in each case, an 
individual gene is used to make each isolated protein subunit . 
When only one subunit is produced in large amounts, the low 
intracellular level of the other 9 subunits is overwhelmed, 
and, thus, the single recombinant protein can be purified and 
recovered in isolation away from the other 9 protein subunits 
with which it would normally associate. Since interacting 
partner subunits are not present, denaturants are not needed 
to obtain the subunit in isolation from other subunits. 

9. The activity of the proteins purified in 
accordance with the present application is demonstrated in the 
present specification by virtue of their being functional in a 
variety of assays including: ' (a) binding to other subunits, 
<b) activating or stimulating ATPase activity when in 
combination with other subunits, (c) activating or stimulating 
replication activity when in combination with other subunits, 
(d) activating or stimulating 3 ' -5 ' exonuclease activity when 
in combination with other subunits. 

PT-nt-.eins ^ nlous to the 6' Subunit of Polymerase III 
Holoenzym^ are Contained in Organisms Other than E. Colx 

10. Most often, a protein active in E. coli has a 
homologous counterpart in higher cells. This is especially 
expected to be true of processes that are essential to life, 
such as DNA replication. Processes underlying other critical - 
to-life processes such as transcription, and ribosome-mediated 
translation, are also conserved in evolution. Some proteins 
in these processes are so similar in prokaryotes and 
eukaryotes that they can be exchanged for one another in vivo, 
and use of prokaryotic genes can lead to identification of the 
eukaryotic counterpart. All cells utilize DNA for their 
genetic material which must be duplicated to propagate the 
species. Hence, it can be anticipated that the central life 
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process of DNA duplication will also be conserved during 
evolution such that it will be performed similarly in 
prokaryotes and eukaryotes. In fact, prokaryotic repl-icase 
components are similar in structure and function to their 
eukaryotic counterparts and can substitute for the eukaryotic 
components in complex multiprotein replication systems 
involving numerous other proteins. 

11. It was generally understood in the field, 
before the filing date of the present application, that 
mechanisms of replication are widely conserved in organisms 
spanning the evolutionary scale. Homology in structure and 
function of the replication apparatus from prokaryotes to 
eukaryotes had already been established from work of a variety 
of different laboratories. It was known that the 
bacteriophage T4 sliding clamp (gene 45 protein) was 
structurally homologous to the human PCNA clamp (Tsurimoto et 
al. "Functions of Replication Factor C and Proliferating Cell 
Nuclear Antigen: Functional Similarity of DNA Polymerase 
Accessory Proteins From Human Cells and Bacteriophage T4 , •' 
PNAS, 87:1023-1027 (1990) ("Tsurimoto")). Moreover, it was 
known that the 3 components of the bacterial replicases (T4 
and E. coll) are so homologous in structure and function to 
the human 3 component replicase, that the 3 components of 
these replicases (clamp, clamp loader, polymerase), could 
substitute in the place of the human 3 components in 
duplication of the SV40 chromosome with several other human 
replication proteins that these replicases need to work with 
(MatsumotoV et al, PNAS, 87:9712-26 (1990) ; Tsurimoto) . In 
other words, the bacterial 3 -component replicases were active 
with other human replication proteins that coordinate their 
actions with the replicase to duplicate the SV40 DNA genome (a 
eukaryotic virus) . The other human proteins in these assays 
that the 3 -component replicases of E. coli and phage T4 can 
work with are: the 3-subunit human RPA factor, the 4-subunit 
human priming machinery, the human topoisomerase , and the SV4 0 
viral large T antigen. The fact that the bacterial replicases 
(phage T4 and E. coli) can work with these other replication 
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proteins shows that they must have very similar structures and 
that the points of contact, between these proteins must be 
evolutionary conserved at the level of the DNA sequence of 
the genes . 

12. Furthermore, in Sanders et al . , "Rules 
Governing the Efficiency and Polarity of Loading a Tracking 
Clamp Protein Onto DNA: Determinants of Enhancement in 
Bacteriophage T4 Late Transcription, The EMBO Journal 
14(16) :39.66-76 (1995) ("Sanders"), the common elements of 
structure and function of replicative DNA polymerases of 
eukaryptes, prokaryotes , and certain viruses are discussed. 
It is disclosed that the replicative DNA polymerases of all of 
these sources are composed of a core enzyme and a set of 
accessory proteins. Further, Stillman, "Smart Machines at the 
DNA Replication Fork," Cell 78:725-28 (1994) ("Stillman") 
discusses the functional similarity of proteins from E . cola, 
humans, and phage T4 that cause replication. Specifically, 
these exhibits show that E. coli contains an accessory 
complex called y complex which includes the subunits T , 6, 6', 
*, and x- Further, these exhibits show that homologous 
proteins to the T complex are also present in eukaryotic 
(containing RFC complex) , phage T4 (containing g44 complex) , 
and human (containing RFC complex) organisms. Further, it was 
known that replicases of humans, E. coli, and the T4 virus 
were functionally, as well as structurally, homologous. They 
each have 3 components: (1) a DNA polymerase, (2) a 
processivity factor (sliding clamp), and (3) and a 5-protein 
ATPase that functions with the processivity factor to load it 
onto DNA. 

13. Furthermore, since E. coli and humans are at 
opposite ends of the evolutionary scale, it would have been 
known to one skilled in the art that all other bacteria and 
eukaryotes between E. coli and humans would also have 
structural homologues to the 6' subunit . Further, those 
skilled in the art recognize the 6' subunit from E. coli has 
sequence homology to accessory protein complexes of various 
other organisms. For example, in O'Donnell et al . , "Homology 
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in Accessory Proteins of Replicative Polymerases - E . coli to 
Humans," Nucleic Acids Research 21(1) :l-3 (1993) 
("O'Donnell") , a comparison of amino acid sequences shows the 
homology between proteins of replicative polymerases of E . 
coli, humans, and phage T4 . In Carter, et al . , 
"Identification, Isolation, and Characterization of the 
Structural Gene Encoding the 6' Subunit of Escherichia coli 
DNA Polymerase III Holoenzyme," J. of Bacteriology , 
175 (12) :3812-22 (1993), Figure 5 diagrams the homology of the 
6' amino acid sequence to other replication proteins. 
Comparison of the 5' amino acid sequence revealed similarity 
to the Al (replication factor C) complex of. HeLa cells and to 
the gene 44 protein (gp44) of bacteriophage T4 . In addition, 
amino acid sequence similarity was found to the gene product 
of B . subtilis. Id. Further, the structural homology of the 
6' subunit to other replication proteins has been proven to be 
true. Cullman, et al . , "Characterization of the Five 
Replication Factor C Genes of Saccharomyces cerevisiae," 
MoT Millar ^ Biology. 15 (9) : 4661-71 (1995). For 

example, the genome project of Haemophilus influenze showed 
homologies to all 10 subunit s of E. .coli DNA polymerase III 
holoenzyme, including 6, 6', X , * and 5. Currently, the 
GenBank now also shows homologues to the 6' subunit of E . coli 
from a large variety of organisms, including the following: 
Procaryotes: Escherichia coli, Haemophilus influenze, 
Micrococcus luteus, Pseudomonas aeruginosa, Bacillus subtilis, 
Caulobacter crescentus ; Archaebacteria : Thermus thermophilis 
(extreme bhermophile) ; Eukaryotes : Drosophila melanogaster 
(fly, insect), Caenorhabditis elegans (namatode, worm), Gallus 
gallus (dog) , Homo sapien (man) , Saccharomyces cerevisiae 
(yeast) , and Saccharomyces pombe (yeast) . 

14 . The sequence of the human homologues to 5 ' , and 
indeed the other 6' homologues, are sufficiently homologous to 
the 5' subunit of E. coli to provide for identifying and 
obtaining the corresponding 5' (holA) gene from these 
organisms using the gene encoding the 6' subunit of E. coli in 
the following ways: (1) use of the E . coli holA gene, or 
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fragments of the E. coll gene, as a probe in a Southern 
analysis of whole cell DNA from another organisms to identify 
the corresponding 6' homologue; (2) use of holA, or its 
fragments, as a probe to screen cDNA plasmid libraries of 
other organisms; (3) use of the hoi A gene sequence to* 
synthesize oligonucleotide primers for PCR to amplify the 
corresponding 6' homologue from total genomic DNA from other 
organisms; and (4) use of the holA gene sequence to identify 
the 6' homologue from a genome sequencing project of other 
organisms by sequence comparison to the E . coli holA gene. 

15. I have solved the structure of the 6' protein 
(in collaboration with Dr. John Kuriyan's laboratory at 
Rockefeller University) . The 6' protein is composed of three 
domains in the shape of a C, and likely performs the clamp 
loading action by relative motions between the top and bottom 
domains allowing it to open and close the sliding clamp ring 
around DNA. The homology of E. coli 5' to the 6' of the 
several homologues listed above in paragraph 13 is well above 
the level needed to predict that they will have the exact same 
chain fold and C- shape. 




16. The crystal structure of E. coli (3 clamp, the 
T4 gp45 clamp, and PCNA have been solved by my lab (in 
collaboration with Kuriyan's lab at Rockefeller University) . 
They have the same chain fold and three dimensional structure. 

( see below) (/? subunit is shown in Kong et al . , Cell, 69:425- 
37 (1992); yeast PCNA is in Krishna, et al . , Cell, 79:1233-43 

(1994); human PCNA is unpublished, T4 PCNA is unpublished). 
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Takase Does Not Disclose the 5 Subu nit of Polymerase III 
Holoenzvme 

17 . Takase relates to the coding of two 
lipoproteins by two genes, rlpA and rlpB, located in the leuS- 
dacA region on the Escherichia, coll chromosome. The rlpA gene 
encodes for a lipoprotein having molecular weight of 3 6K. 
Figure 6 of the reference details the sequence of the 3 6K 
lipoprotein gene rlpA and its 5'- and 3'- flanking regions and 
the amino acid sequences deduced from the nucleotide sequence. 
Figure 7 of the reference details- the sequence of the 2?lpB 
gene. At the end of the sequence in Figure 7, the last 230 
base pairs constitute a sequence that encodes the first 20-2 5% 
of the holA sequence. Takase did not recognize this to be an 
open reading frame of a putative unknown gene, nor did this 
reference disclose the gene. See the diagram attached hereto 
as Exhibit %. Further, as shown in Dong, et al . , "DNA 
Polymerase III Accessory Proteins, 11 J. Biological Chem. , 
268 (16) :11758-765, 11759 n. 3 ("Dong"), Takase's published 
sequence was incorrect and incomplete, in fact, the first 54 
nucleotides of the 6 gene are incorrect by 11 nucleotides. 
Thus, the 6 protein, subunit of polymerase III holoenzyme and 
the gene encoding the 6 protein subunit of . the polymerase III 
holoenzyme of the present invention are not disclosed by 
Takase . 
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18 . I hereby declare that all statements made 
herein of my own knowledge are true and that all statements 
made on information and belief are believed to be true; and 
further that these statements were made with the knowledge 
that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under section 
1001 of Title 18 of the United States Code, and that such 
willful false statements may jeopardize the validity of the 
application or any patent issuing thereon. 
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as possible, with special emphasis on unusual bacteria that had previ- 
ously eluded reliable phylogenetic placement. 

Woese chose 16S rRNA for construction of a phylogenetic tree be- 
cause it is truly universal and is so highly conserved in structure and 
function that phylogenetic trees are relatively easy to construct. In 
addition, 16S rRNA is an abundant RNA that can be quickly purified 
and analyzed even from small samples of cells. The early stages of 16S 
rRNA sequence analysis culminated in 1977 with a radical hypothe- 
sis. Woese proposed that procaryotes should be divided into two 
groups, called the archaebacteria and the eubacteria, which are as 
different from each other as either is from the eucaryotes. The clear 
implication of this proposal was that archaebacteria, eubacteria, and 
eucaryotes had all descended from an earlier common ancestor that 
did not survive. This hypothesis met with substantial resistance 
within the biological community because it contradicted two common 
but unfounded assumptions — that all bacteria are closely related and 
that bacteria more closely resemble the first living cells than do any 
eucaryotes. 



Archaebacteria Assume Their Rightful Place 5 

Despite widespread scepticism about the value of dividing procary- 
otes into eubacteria and archaebacteria, proponents of the hypothesis 
continued to refine the universal phylogeny based on 16S rRNA (Fig- 
ure 28-24) and to amass supporting biochemical evidence (see Table 
28-5). Today, there is no longer any doubt that all living organisms 
belong to three coequal kingdoms, or lines of descent, and that none 
of these three kingdoms can be thought of as having given rise to the 
others (see Figure 28-24 and Table 28-5). Instead, all three have de- 
scended from an earlier living organism, or progenote, whose nature 
we can only infer by asking what archaebacteria, eubacteria, and the 
eucaryotic nucleus have in common. (The eucaryotic nucleus is di- 
rectly descended from the progenote, but as we shall see, eucaryotic 
organelles such as the mitochondrion and chloroplast were derived 
by endosymbiosis of oxygen-fixing and photosynthetic eubacteria.) 

The universal phylogeny based on 16S-like rRNA reveals other 
startling conclusions. Human beings (Homo sapiens) are in fact more 
closely related to corn (Zea mays) than a Gram-negative bacterium 
(£. coli) is to a Gram-positive bacterium (Bacillus subtilis) (see Figure 
4-8 for the significance of Gram staining). Thus, the evolutionary dis- 
tance separating two different bacteria can be greater than the dis- 
tance between a sophisticated plant and the most sophisticated ani- 
mal. The 16S-like phylogeny also provides definitive evidence for the 
endosymbiont hypothesis that mitochondria and chloroplasts are 
descended from eubacteria (see the section entitled The Endosymbi- 
otic Origin of Mitochondria and Chloroplasts). 



The Progenote (First Cell) 
Differed from All Modern Cells 

The universal phylogeny based on 16S-like rRNA tells us that the 
three great kingdoms of living organisms are all descended from a 
progenote. But what was this progenote like? The abundance of 
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introns in archaebacteria and eucaryotes suggests that the progenote 
had introns but that these were lost during eubacterial evolution as 
the genome was streamlined for very rapid growth (see Table 28-5). 
Similarly, since eubacteria and eucaryotes have ester-linked un- 
branched lipids containing L-glycerophosphate, it is likely that the 
progenote did, too. 



Table 28-5 A Few of the Known Differences Between 
Archaebacteria and Eubacteria 



Archaebacteria 



Eubacteria 



Genomic rearrangements common 
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Figure 28-24 

An evolutionary tree can be constructed 
by comparing the complete sequences 
of 21 different 16S and 16S-like riboso- 
mal RNAs (rRNAs). The scale bar rep- 
resents the number of accumulated 
nucleotide differences per sequence 
position in the rRNAs of the various 
organisms. Note that the scale bar can- 
not be recalibrated in billions of years 
without making the unjustified assump- 
tion that mutations accumulate in the 
DNA of all organisms at the same rate 
per unit time. [After N. R. Pace, G. J. 
Olsen, and C. R. Woese, Cell 45 
(1986):325.] 



But we cannot automatically assume that any trait shared by two of 
the three great kingdoms must reflect the nature of the progenote. 
Such a shared trait could also have arisen more than once as different 
organisms independently discovered its value. Independent evolu- 
tion of the same characteristic in separate branches of a phylogenetic 
tree is called convergent evolution. 

Bacteria Are More Highly 
Evolved than Higher Organisms 

Efforts to deduce the nature of the progenote are also confounded by 
the fact that different organisms evolve at different rates. Although 
mutations arise in DNA throughout the life cycle of an organism, the 
effect of these mutations on fitness can only be tested in each nczc 
generation. As a result, rapidly multiplying organisms like bacteria 
and many lower eucaryotes have had a far greater opportunity to lose 
or modify the characteristics of the progenote than have more slowly 
growing higher organisms. This implies that many bacteria, although 
they are no more ancient than eucaryotes (see Figure 28-24), are actu- 
ally more highly evolved. 

The Endosymbiotic Origin of 
Mitochondria and Chloroplasts 66 " 70 

Eucaryotic cells contain a variety of internal organelles, each sur- 
rounded by a lipid bilayer. Many of these organelles (e.g., lysosomes, 
peroxisomes, and the endoplasmic reticulum) are relatively simple 
(see Figure 18-8). But two of them, mitochondria and chloroplasts, are 
about the same size as bacteria and, like bacteria, have circular DNA 
genomes (see Figure 15-17). Mitochondrial and chloroplast genomes 
encode the rRNA and tRNA components of the organellar translation 
apparatus, as well as mRNAs for organellar proteins that are synthe- 
sized within the organelle. The mitochondrial and chloroplast ribo- 
somes are sensitive to antibiotics such as chloramphenicol, which kill 
many bacteria but do not affect the cytoplasmic ribosomes of 
eucaryotes. 

The resemblance of mitochondria and chloroplasts to bacteria natu- 
rally led to the idea that these organelles began as free-living bacteria 
that had been engulfed by a primitive eucaryote (the urcaryote; see 
Figure 28-25). Once internalized, these symbiotic bacteria flourished 
within the host eucaryote as endosymbionts, while supplying the 
host with the ability to generate energy by oxidative phosphorylation 
and (in the case of plants) by photosynthesis. As the 
protomitochondrion and the protochloroplast slowly degenerated 
into specialized organelles, genes were transferred from organellar 
DNA to the nuclear genome of the host, leaving only a handful of 
essential genes behind in the organelle. As a result, most mitochon- 
drial and chloroplast proteins are now encoded in the nuclear DNA, 
translated in the cytoplasm, and transported across the outer 



Figure 28-25 

A possible scheme for eady evolution. [After J. E. Darnell and W. F. Doo- 
little, Proc. Nat. Acad. Set, 83 (1986):1271.] 
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membrane of the organelle. Only those molecular species that cannot 
cross the outer membrane (rRNA, tRNA, mRNA, and some proteins) 
must still be encoded in the organellar genome. (Some RNA mole- 
cules can cross the organellar membrane, however, as shown by the 
recent discovery that the RNA component of a mammalian mitochon- 
drial RNA processing enzyme resembling RNase P is encoded by a 
nuclear gene.) As fewer and fewer proteins were encoded within the 
evolving endosymbiont, the organellar translation system no longer 
had to be extremely accurate. Eventually, the organellar translation 
apparatus degenerated into an apparently minimal translation ma- 
chine (see Figure 14-15) and the mitochondrial genetic code under- 
went some surprising changes (see Table 15-9). 

Did mitochondria and chloroplasts evolve from eubacterial or ar- 
chaebacterial progenitors? Comparison of bacterial and eucaryotic 
cytochrome c initially suggested that mitochondria might have de- 
scended from the purple photosynthetic eubacteria. Comparison of 
animal mitochondrial and eubacterial 16S rRNA sequences failed to 
prove this, however, because the animal mitochondrial rRNA se- 
quences had diverged too extensively to permit a meaningful compar- 
ison. Fortunately, plant mitochondrial 16S rRNAs are less divergent, 
and in this case, the comparison led to a surprising result. Plant mito- 
chondria descended from a group of purple eubacteria that includes 
rhizobacteria (see Figure 22-49), agrobacteria (see Figure 22-48), and 
rickettsias (see page 544). Even today, each of these procaryotes is 
able to live within or in very close association with eucaryotic cells. 
This makes the endosymbiont hypothesis all the more plausible and 
allows us to complete a tentative scheme for early evolution (Figure 
28-25). 



Several Bacteriophage T4 Genes 
Contain Self-Splicing Introns 71,72 

No introns have ever been found in £. coli, the most intensively stud- 
ied of all procaryotes. The discovery in 1984 of an intron in a bacterio- 
phage of E. coli therefore came as quite a shock in 1984. Three differ- 
ent bacteriophage T4 genes are now known to have self-splicing 
introns resembling the Tetrahymena rRNA intron. Two of the genes 
encode enzymes that convert RNA precursors into DNA precursors 
(thymidylate synthase and the small subunit of ribonucleoside di- 
phosphate reductase; see Figure 28-20). By expressing high levels of 
these two enzymes, T4 diverts the metabolic resources of the infected 
bacterium from making RNA to making DNA, thereby increasing the 
yield of DNA-containing progeny phage. 

Why do self-splicing introns interrupt useful T4 genes? Although it 
is possible that the introns are harmless or hard to get rid of, another 
fascinating possibility is that the introns might actually contribute to 
efficient phage growth. Recall that self-splicing is initiated by attack of 
a free guanine nucleotide on the 5' splice site of the intron (see Fig- 
ures 28-8 and 28-9). When high levels of guanine nucleotides are pres- 
ent early in infection, efficient self-splicing of the transcripts will pro- 
duce mRNAs whose protein products catalyze conversion of RNA 
precursors into DNA precursors. As guanosine nucleotide levels 
begin to fall later in infection, the efficiency of self-splicing will de- 
crease, and the rate of conversion of RNA precursors to DNA precur- 
sors will consequently slow down. T4 thus appears to use the de- 
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Contigl46 


+1 


51 


0. 


.99 1 


Contigl91 


-3 


46 


0. 


,993 3 



>Contig2 00 

Length = 93, 974 
Minus Strand HSPs : 
Score = 147 (68.2 bits), Expect = 1.3e-13, P = 1.3e-13 
Identities = 35/98 (35%), Positives = 48/98 (48%), Frame = -3 
Query: 62 CRGCQLMQAGTHPDYYTI^PEKGKNTLGvDAVllEvTEKLNEHARLGGAKWWVTDAALLT 121 

C CQ Y L + G+D +REV E G KV + + +L+ 

Sbjct: 72738 CGVCQSCTQIDAGRYVDLLEIDAASNTGIDNIREVLENAQYAPTAGKYKVYIIDEVHMLS 72559 
Query: 122 DAAANALLKTLEEPPAETWFFLATREPERLLATLRSRC 159 

+A NA+LKTLEEPP F LAT +P + + T+ SRC 
Sbjct: 72558 KSAFNAMLKTLEEPPEHVKFILATTDPHKVPVTVLSRC 72445 
Score = 98 (45.4 bits), Expect = 1.2e-14, Poisson P(2) = 1.2e-14 
Identities = 19/55 (34%), Positives = 29/55 (52%), Frame = -3 
Query: 21 GRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPD 75 

GR HHA'L+ G+G + L++ L C+ Q + CG C+ C + AG + D 
Sbjct: 72852 GRLHHAYLLTGTRGVGKTT I ARI LAKSLNCENAQHGE PCGVCQSCTQ I DAGRYVD 72688 
Score = 44 (20.4 bits), Expect =13., Poisson P(2) = 1.0 
Identities = 8/14 (57%), Positives = 8/14 (57%), Frame = -2 
Query: 238 HEQAPARLHWLATL 251 

H PR HWLA L 
Sbjct: 88705 HTPYPQRAHWLALL 88664 
Score = 44 (20.4 bits), Expect = 2.8, Poisson P(3) = 0.94 
Identities = 10/20 (50%), Positives = 13/20 (65%), Frame = -2 
Query: 315 LLLRIEHYLQPGWLPVPHL 334 



LL + YL+ GV+ PVP L 
Sbjct: 10810 LLGMVARYLKLGVLKPVPSL 10751 
Score = 43 (19.9 bits), Expect = 1.1, Poisson P(4) = 0.67 
Identities = 8/24 (33%), Positives = 13/24 (54%), Frame = -3 
Query: 137 AETWF F LATRE PERL LATLRS RC R 160 

A+ W + T+ P+ LA + CR 
Sbjct: 49281 AQWWLVICTQSPKIGLAMANAACR 49210 
Score = 41 (19.0 bits), Expect = 2.6, Poisson P(5) = 0.93 
Identities = 9/14 (64%), Positives = 9/14 (64%), Frame = -1 
Query: 186 ALLAALRLSAGSPG 199 

A L ALR AG PG 
Sbjct: 15380 AFLQALRKGAGQPG 15339 
>Contigl89 

Length = 45,334 
Minus Strand HSPs : 
Score = 152 (70.5 bits), Expect = 2.5e-14, P = 2.5e-14 
Identities = 32/87 (36%), Positives = 51/87 (58%), Frame = -1 
Query: 90 VTDAWEVTEKLNEHARLGGAKVVWTDAALLTDAAANALLKTLEEPPAETWFFLATREPE 149 

+DAVRE+ + + + GG +V+ + A + AAN+LLK LEEPP + F L + + 
Sbjct: 29077 IDAVREIIDNVYLTSVRGGLRVILIHPAESMNVQAANSLLKVLEEPPPQWFLLVSHAAD 28898 
Query: 150 RLLATLRSRCRLHYLAGPPEQYAVTWL 176 

++L T++SRCR LP A+ +L 
Sbjct: 28897 KVLPTI KSRCRKMVLPAPSHGEALAYL 28817 
Score = 86 (39.9 bits), Expect = 1.4e-ll, Poisson P(2) = 1.4e-ll 
Identities = 13/27 (48%), Positives = 16/27 (59%), Frame = -1 
Query: 55 GHKSCGHCRGCQLMQAGTHPDYYTLAP 81 

G K CG C C L G+HPD+Y + P 
Sbjct: 29203 GCKPCGECMSCHLFGRGSHPDFYEITP 29123 
Score = 45 (20.9 bits), Expect =26., P = 1.0 
Identities = 8/20 (40%), Positives = 11/20 (55%), Frame = -1 
'Query: 44 LSRYLLCQQPQGHKSCGHCR 63 

LSR++ +P G C CR 
Sbjct: 25822 LSRHISFNRPSGRFGCSGCR 25763 
Score = 43 (19.9 bits), Expect = 1.9, Poisson P(3) = 0.85 
Identities = 12/46 (26%), Positives = 20/46 (43%), Frame = -1 
Query: 134 EPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWLSRE 179 

E P+E F T + + L++ L L P +Y W+ R+ 
Sbjct: 18817 EMPSENHFT*QTD*QKTRMTLLKNDTFLRALLKQPVEYTPIWMMRQ 18680 
>Contigl38 

Length = 6169 
Minus Strand HSPs: 
Score = 95 (44.1 bits), Expect = 2.7e-06, P = 2.7e-06 
Identities = 16/37 (43%), Positives = 25/37 (67%), Frame = -1 
Query: 4 YPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDAL 40 

YPWL P + + + ++ G GHHA+LI+A G+G + L 
Sbjct: 1849 YPWLMP I YHQI AQTFDEGLGHHAVL I KADAGLGVERL 1739 
>Contigl90 

Length = 52,290 
Plus Strand HSPs: 
Score = 46 (21.3 bits), Expect =19., P = 1.0 
Identities = 11/27" (40%), Positives = 19/27 (70%), Frame = +1 
Query: 177 SREVTMSQDALLAALRLSAGSPGAALA 203 

S+ ++ S+ AL A++RLSA + +A A 
Sbjct: 48487 SKSLSNSRAALTASVRLSASTTASARA 48567 
Score = 45 (20.9 bits), Expect = 4.7, Poisson P(2) = 0.99 
Identities = 11/27 (40%), Positives = 13/27 (48%), Frame = +1 
Query: 102 EHARLGGAKYVWVTDAALLTDAAANAL 128 

E ARL A + +W LL D N L 

Sbjct: 8032 EKARLALAMI IWQKPNLLLLDEPTNHL 8112 
Score = 44 (20.4 bits), Expect = 1.0, Poisson P(3) = 0.63 
Identities = 10/32 (31%), Positives = 15/32 (46%), Frame = +2 



Query: 45 SRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDY 76 

+RY P+G ++ R CQ + G DY 

Sbjct: 2855 TRYYPLLHPRGGRAFARPRNCQNLP *GGDADY 2950 
Score = 40 (18.5 bits), Expect = 0.033, Poisson P{7) = 0.033 
Identities = 9/16 (56%), Positives = 11/16 (68%), Frame = +2 
Query: 145 TRE PERLLATLRSRCR 160 

TR+P RL A+L S R 
Sbjct: 15881 TRKPRRLRASLNSEHR 15928 
Score = 40 (18.5 bits), Expect = 0.033, Poisson P(7) = 0.033 
Identities = 9/25 (36%), Positives = 15/25 (60%), Frame = +1 
Query: 167 PPEQYAWWLSREVTMSQDALLAAL 191 

PP+ T SR +T++ ++AAL 
Sbjct: 40399 PPQTRVGTIFSRSLTVTGFTIMAAL 40473 ......... 

Score = 40 (18.5 bits), Expect = 0.033, Poisson P(7) = 0.033 
Identities = 10/36 (27%), Positives = 14/36 (38%), Frame = +2 
Query: 51 QQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKGKN 86 

+ P +S G GC + T PD G+N 
Sbjct: 3446 RS PADRRSEGKTVGCASRRHQTRPDS ERQCQRPGRN 3553 
Score = 40 (18.5 bits), Expect = 0.033, Poisson P(7) = 0.033 
Identities = 15/41 (36%), Positives = 20/41 (48%), Frame = +1 
Query: 220 LAYSVPSGDWYSLLAALNHEQAPARLHWLATLLMDALKRHH 260 

LA VPS LLA ++ Q ARL T + LK+ + 

Sbjct: 51634 LARRVPSAFKAKLLADMSDLQKSARLGQPDTTVAQWLKQRN 51756 
Score = 39 (18.1 bits), Expect = 0.51, Poisson P(7) = 0.40 
Identities = 12/45 (26%), Positives = 19/45 (42%), Frame = +3 
Query: 104 ARLGGAKWWTDAALLTDAAANALLKTLEEPPAETWFFLATREP 148 

A GA++ +++ANL+ PPEW +RP 
Sbjct: 25044 APASGAGILTGMEVRVFSRAPNNKLSRLG*FPPLERWMSGLSRTP 25178 
>Contigl99 

Length = 81,564 
*• Minus Strand HSPs : 
Score = 51 (23.7 bits), Expect = 0.17, Poisson P(2) = 0.16 
Identities =12/33 (36%), Positives = 18/33 (54%), Frame = -3 
Query: 189 AALRL SAGS PG AALAL FQGDNWQ ARET LC Q AL A 221 

AA+ LSAGS + +G W + +C+A A 

Sbjct: 13054 AAMILSAGSGSRITPVEKGITWFGLQPICRAAA 12956 
Score = 51 (23.7 bits), Expect = 0.17, Poisson P(2) = 0.16 
Identities = 14/62 (22%), Positives = 21/62 (33%), Frame = -2 
Query: 53 PQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKW 112 

P+ CHR + H LP GKN G R+ + L + + 

Sbjct: 80819 PRSRFDCRHARPRYRHRRKQHTGRCRLCPRTGKNRCGRGLGRKLRRSLAPERDAPSKPLI 80640 
Query: 113 WV 114 
W+ 

Sbjct: 80639 WM 80634 
>Contig201 

Length = 92,813 
Minus Strand HSPs : 
Score = 54 (25.0 bits), Expect = 1.6, P = 0.80 
Identities = 13/29 (44%), Positives = 18/29 (62%), Frame = -2 
Query: 188 LAALRLSAGSPGAALALFQGDNWQARETL 216 

L A++L+ G GAA LF D QA E++ 
Sbjct: 55888 LVAVKLNRGELGAAQLLFAPDETQALESV 55802 
Score = 47 (21.8 bits), Expect = 2.3, Poisson P(2) = 0.90 
Identities = 8/32 (25%), Positives = 17/32 (53%), Frame = -3 
Query: 41 IYALSRYLLCQQPQGHKSCGHCRGCQLMQAGT 72 

++ ++++ Q KSC +CR + Q G+ 

Sbjct: 81990 VFGITKFCSRQTMYWRKSCAYCRNWKSSQHGS 81895 
>Contigl88 

Length = 44,251 
Plus Strand HSPs: 
Score = 45 (20.9 bits), Expect = 4.1, Poisson P(2) = 0.98 



OU Neisseria Gonorrhoeae Sequence Blast 
Server Results 



TBLASTN 1.3.9 [29-Oct-93] 

Reference: Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, 
and David J. Lipman (1990). Basic local alignment search tool. J. Mol . Biol. 
215:403-410. - - 

Notice: statistical significance is estimated under the assumption that the 
equivalent of one complete reading frame of the database codes for protein and 
that significant alignments will involve only coding reading frames. 
Query= ecoli . delta 

(343 letters) 
Database : /gono/abi/Gcphrap/auto_gono 

114 sequences; 2,133,469 total letters. 
Searching done 

Smallest 
Poisson 





Reading 


High 


Probability 


Sequences producing High-scoring Segment Pairs: 


Frame 


Score 


P(N) N 


Contigl88 


-2 


233 


1.2e-25 1 


Contigl83 


+2 


46 


0.26 4 


Contigl60 


+2 


55 


0.72 1 


Contigl63 


+ 2 


51 


0.77 2 


Contig200 


+ 1 


51 


0.77 2 


.Contigl49 


-1 


44 


0.85 3 


Contigl89 


+2 


53 


0.91 1 


Contigl26 


+2 


47 


0.91 2 


Contigl28 


+1 


49 


0.95 2 


Contigl29 


-3 


45 


0.98 2 


Contigl65 


+1 


44 


0.98 2 


Contigl90 


+1 


51 


0.99 1 


Contigl87 


-3 


48 


0.991 3 



>Contigl88 

Length = 44, 251 
Minus Strand HSPs : 
Score = 233 (107.6 bits), Expect = 1.2e-25, P = 1.2e-25 
Identities = 56/186 (30%), Positives = 89/186 (47%), Frame = -2 
Query: 10 RAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFSIDPNTDWNAIFSLCQ 69 

R + L+ Y+ + G + LL E+ DA+R A QG+ + + D + DWN + 

Sbjct: 12126 RIDTDAPLKPLYVIHGEEELLRIEAVDALRAAAKKQGYLNREAYTADASFDWNELLQTAG 11947 
Query: 70 AMSLFASRQTLLLLLPENGPNAAINEQLLTLTGLLHDDLLLIVRGNKLSKAQENAAWFTA 129 

LFA + L L +P P EL L +D + +V KL K + + WF A 

Sbjct: 11946 NAGLFADLKLLELHIPNGKPGKNGGEALQDFAARLPEDTVTLVLLPKLEKTRLQSKWFAA 11767 
Query: 130 LANRSVQVTCQTPEQAQLPRWVAARAKQLNLELDDAANQVLCYCYEGNLLALAQALERLS 189 

LA + + A LP+W+ R ++ L ++ A + EGNLLA Q + + + L+ 

Sbjct : 11766 LAAKGEVWEAKPVGAAAL PQWI RGRLDKIGLGI EADALALF AERVEGNLLAARQE I DKLA 11587 
Query: 190 LLWPDG 195 
LL+P G 

Sbjct: 11586 LLYPKG 11569 
Score = 73 (33.7 bits), Expect = 7.6e-08, Poisson P(2) = 7.6e-08 
Identities = 18/62 (29%), Positives = 28/62 (45%), Frame = -2 

Query: 199 LPRVEQAVNDAAHFTPFHWVDALLMGKSKRALHILQQLRLEGSEPVILLRTLQRELLLLV 258 
+ +AV + AFF A+ R +LLEG EPV+LL + ++ L+ 

Sbjct: 11556 I DEAQTAVANVARFDAFQLAGAWMKADVPRVCRLLDGLEEEGE E PVLLLWAVAEDVRT L I 11377 

Query: 259 NL 260 



Sbjct: 11376 RL 11371 
Score = 44 (20.3 bits), Expect = 0.90, Poisson P(3) = 0.60 
Identities = 11/55 (20%), Positives = 26/55 (47%), Frame = -2 

Query : 277 RVWQNRRGMMGEALNRLSQTQLRQAVQLLTRTELTLKQDYGQSVWAELEGLSLLL 331 
R+W + + + + A+ R+S +L A++ +++K W +L + L 

Sbjct: 11319 RLWGDKQTLAPLAVKRI SWRLLDALKTCAQIDRI IKGAEDGDAWTVFKQLWSL 11155 

>Contigl83 

Length = 34,103 
Plus Strand HSPs : 
Score = 46 (21.2 bits), Expect = 21., P = 1.0 
Identities = 9/29 (31%), Positives = 16/29 (55%), Frame = +2 
Query: 293 LSQTQLRQAVQLLTRTELTLKQDYGQSVW 321 ' " " 

L T +A + . RTE +++ +G+ VW 
Sbjct: 21089 LVSTSCNRAGKRACRTEREVRRQFGRDVW 21175 
Score = 43 (19.9 bits), Expect = 1.4, Poisson P(3) = 0.75 
Identities = 8/21 (38%), Positives = 12/21 (57%), Frame = +2 
Query: 259 NLKRQS AHT PLRALFDKHRVW 279 

N R+ +TP ++ F K R W 
Sbjct: 10259 NAVRRFFNTPSKSCFSKARAW 10321 
Score = 43 (19.9 bits), Expect = 1.4, Poisson P(3) = 0.75 
Identities = 9/23 (39%), Positives = 14/23 (60%), Frame = +2 
Query: 72 SLFASRQTLLLLLPENGPNAAIN 94 

+L A R L+P++G N+ IN 

Sbjct: 6584 NLSAGRVRTAFLMPKHGKNSKIN 6652 
Score = 43 (19.9 bits), Expect = 1.4, Poisson P(3) = 0.75 
Identities = 12/47 (25%), Positives = 19/47 (40%), Frame = +2 
Query: 113 RGNKLSKAQENAAWFTALANRSVQVTCQTPEQAQLPRWVAARAKQLN 159 

RG+ + A+W + R + C +LP WVA + N 

Sbjct: 10856 RG S YWL S S AVT AS WRARMWARLRKGWC S HNANRRL PMWVAQ P S SMEN 10996 
"Score = 42 (19.4 bits), Expect = 0.30, Poisson P(4) = 0.26 

Identities = 10/34 (29%), Positives = 17/34 (50%), Frame = +2 
Query: 154 RAKQLNLELDDAANQVLCYCYEGNLLALAQALER 187 

R K++ EL + CYC + L A+ + E+ 
Sbjct: 6923 RYKEVIAELLAKGDAYYCYCSKEELEAMREKAEK 7024 
>Contigl60 

Length = 17,573 
Plus Strand HSPs: 
Score = 55 (25.4 bits), Expect = 1.3, P = 0.72 
Identities = 12/28 (42%), Positives = 16/28 (57%), Frame = +2 
Query: 124 AAWFTALANRSVQVTCQTPEQAQLPRWV 151 

AA + L +R VT P++AQ RWV 
Sbjct: 8054 AALYIRLCSRLPAVTAPI PQKAQKARWV 8137 
Score = 44 (20.3 bits), Expect =3.8, Poisson P(2) = 0.98 
Identities = 11/32 (34%), Positives = 15/32 (46%), Frame = +1 
Query: 163 DDAANQVLCYCYEGNLLALAQALERLSLLWPD 194 

DDA +V G + A LE+ L +PD 

Sbjct: 14512 DDAVKEVESLLMYGQI EAAMDVLEQAVLKYPD 14607 
>Contigl63 

Length = 24,139 
Plus Strand HSPs: 
Score = 51 (23.6 bits), Expect = 4.6, P = 0.99 
Identities = 9/21 (42%), Positives = 13/21 (61%), Frame = +2 
Query: 134 SVQVTCQTPEQAQLPRWVAAR 154 

SV++ C +P A LP W+ R 
Sbjct: 13157 SVRLRCPSPSDATLPFWLRRR 13219 
Score = 46 (21.2 bits), Expect = 1.5, Poisson P(2) = 0.77 
Identities = 9/18 (50%), Positives = 12/18 (66%), Frame = +1 
Query: 105 HDDLLLIVRGNKLSKAQE 122 

HDDLLL+++G QE 
Sbjct: 5695 HDDL LLVLKG AANKLVQE 5748 



Following those BLAST hits is the sequence of the contig containing the top 
hit. 



TBLASTN 2.0al9MP-WashU [14-Jul-1998] [Build linux-x86 18:51:45 30-Jul-1998] 
Reference: Gish, Warren (1994-1997). unpublished. 

Altschul, Stephen F. , Warren Gish, Webb Miller, Eugene W. Myers, and David J. 
Lipman (1990). Basic local alignment search tool. J. Mol. Biol. 215:403-10. 

Notice: statistical significance is estimated under the assumption that the 
equivalent of one complete reading frame of the database codes for protein and 
that significant alignments will involve only coding reading frames. 

Query= delta prime 

(334 letters) 



Database: /usr/local/db/s jiutref aciens 

2430 sequences; 5,974,789 total letters. 
Searching 10 20 30 40 50 60 70 . 



.80 90. 



.100% done 



Sequences producing High-scoring Segment Pairs: 



Smallest 
Sum 

Reading High Probability 
Frame Score P(N) N 



gsp_271 
gsp_387 



+3 302 6.4e-27 1 
+1 192 1.9e-13 1 



>gsp_271 

Length = 11,991 

Plus Strand HSPs : 

Score = 302 (106.3 bits), Expect = 6.4e-27, P = 6.4e-27 
Identities = 84/274 (30%), Positives = 132/274 (48%), Frame = +3 

Query: 5 PWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRG 64 

PWL + + Q + HAL+ G + L ++R +C QP CG C + 

Sbjct: 1842 PWLDVPRQAFLTQLQTQKVPHAQLVGIDSAYGGELLSVFMARAAMCSQPTHTGGCGFCKS 2021 

Query: 65 CQLMQAGTHPDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKVVWVTDAALLTDAA 124 

CQL AG HPD+Y + E + + VD +RE+ +L+ A+ G +V + + L A+ 
Sbjct: 2022 CQLFDAGNHPDFYQI--EADGHQIKVDQIRELCSRLSATAQQSGRRVAIIHHSERLNSAS 2195 

Query: 125 ANAL LKTLEEPPAETWFFLATRE PERL LATLRSRC-RLHYLAGPPEQYAVTWLS RE VTMS 183 

ANALLKTLEEP +T L + P RL+AT+ SRC RL ++A P + WL ++ + 

Sbjct: 2196 ANALLKTLEEPGKDTLLLLHSDTPARLMATISSRCQRLPFVA-PSKTLIKNWLIQQCQIQ 2372 

Query: 184 QDALLAALRLSAGSPGAALALFQGDNWQAR-ETLC QALAYSVPSGDWYSLLAALNHE 239 

+D L + G A +L +R +TL + A S+ SG + L ++ + 

Sbjct: 2373 EDVTWC-LSWGGPLKLAESLQSNSTQPSRYQTLLGFRKDWAQSLSSGHLCASLLIISEQ 2549 

Query: 240 QAPARLHWLATLLMDALKRHHGAAQVTNVDVPGLVAEL 277 

Q L L LL L ++ + L A++ 

Sbjct: 2550 QI IDALKVLYLLLRQI LLKNGNQDAYVQAQIGNLAAKV 2663 



>gsp_3 87 

Length = 3834 

Plus Strand HSPs : 

Score = 192 (67.6 bits), Expect = 1.9e-13, P = 1.9e-13 
Identities = 59/185 (31%), Positives = 86/185 (46%), Frame = +1 



Query : 


o o 
z z 


Sbjct: 


562 


Query : 


82 


Sbjct: 


736 


Query : 


142 


Sbjct: 


913 


Query : 


197 


Sbjct: 


1093 



R HHA L 



G+G +L 



+ + L C+ 



CG C C + 



D 



T VD RE+ + + 



LAT +P++L T+ SRC 



KV + + +L+ + + NALLKTLEEPP 



+Q 



T L 



+T Q 



-DALLAALRLSAG 196 
+AL + + G 



AL+L 



Parameters : 
B=5 

ctxf actor=6 . 00 
E=10 



Query 
Frame 
+0 



MatID Matrix name 
0 BLOSUM62 
Q=9,R=2 



As Used 

Lambda K 
0.321 0.136 
0.244 0.0300 



H 

0.423 
0.180 



Computed 

Lambda K H 
s ame s ame s ame 
n/a n/a n/a 



Query 

Frame MatID Length Eff .Length E S W 

+0 0 334 334 10. 62 3 



T X E2 S2 
13 22 0.069 37 
33 0.063 42 



Statistics : 



Database: /usr/local/db/s_putref aciens 
Title: /usr/local/db/s_putref aciens 
Release date : unknown 

Posted date: 10:07 AM EST Dec 15, 1998 
Format : BLAST 

# of letters in database: 5,974,789 

# of sequences in database: 2430 

# of database sequences satisfying E: 2 
No. of states in DFA: 540 (57 KB) 
Total size of DFA: 97 KB (128 KB) 

Time to generate neighborhood: O.OOu 0.00s O.OOt Elapsed: 00:00:00 
No. of threads or processors used: 1 

Search cpu time: 4.81u 0.01s 4.82t Elapsed: 00:00:05 

Total cpu time: 4.84u 0.01s 4.85t Elapsed: 00:00:05 

Start: Wed Mar 17 09:14:58 1999 End: Wed Mar 17 09:15:03 1999 



Following those BLAST hits is the sequence of the contig containing the top 
hit. 



TBLASTN 2 . 0al9MP-WashU [ 14- Jul-1998 ] [Build linux-x86 18:51:45 30-Jul-1998] 
Reference : Gish, Warren (1994-1997) . unpublished. 

Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. 
Lipman (1990). Basic local alignment search tool. J. Mol. Biol. 215:403-10. 

Notice: statistical significance is estimated under the assumption that the 
equivalent of one complete reading frame of the database codes for protein and 
that significant alignments will involve only coding reading frames. 

Query= e coli delta 
(343 letters) 



Database: /usr/local/db/s_putref aciens 

2430 sequences; 5 , 974 , 789 -total letters. 
Searching. ...10. ...20. ...30. ...40 50. ...60. ...70, 



.80 90, 



.100% done 



Sequences producing High-scoring Segment Pairs: 



Smallest 
Sum 

Reading High Probability 
Frame Score P(N) N 



gsp_230 
gsp_343 



+2 
+1 



564 
70 



l.le-54 
0.999 



1 
1 



>gsp_2 30 

Length = 21, 837 
Plus Strand HSPs : 

Score = 564 (198.5 bits), Expect = l.le-54, P = l.le-54 
Identities = 135/343 (39%), Positives = 184/343 (53%), Frame = +2 

Query: 2 IRLYPEQLRAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFSIDPNTDW 61 

+R+YP+QL LN LA YL+ G+DP LL+ S+D +RQ A QGFEE + +W 

Sbjct: 14210 MRVYPDQLSRHLNP-LHACYLIFGDDPWLLETSKDQIRQAAKRQGFEERVQLIQETGFNW 14386 

Query: 62 NAIFSLCQAMSLFASRQTLLLLLPENGPNAAINEQLLTLTGLLHDDLLLIVRGNKLSKAQ 121 

+ QAMSLF+SR+ + L LP P A + L +L D+LLI+ G KL+ Q 

Sbjct: 14387 GDLTQEWQAMSLFSSRRIIELTLPSAKPGADGSAALQSLLQTPSPDVLLILEGPKLASEQ 14566 

Query: 122 ENAAWFTALANRSVQVTCQTPEQAQLPRWVAARAKQLNLELDDAANQVLCYCYEGNLLAL 181 

N+ WF L + + + C TPE Q RW+ +R L L A +L YEGNLLA 

Sbjct: 14567 TNSKWFKTLDSLGIYLPCTTPEGDQFRRWLDSRIAHFKLNLQPDARAMLYSLYEGNLLAA 14746 

Query: 182 AQALERLSLLWPDGKLTLPRVEQAVNDAAHFTPFHWVDALLMGKSKRALHILQQLRLEGS 241 

QA++ LLLP + + D + FTF DALL + A H+L QL EG+ 

Sbjct: 14747 DQAMQLLQLLSPSKPIGADELSHYFEDQSRFTVFQLTDALLNNRQDSAQHMLAQLNGEGT 14926 

Query: 242 EPVILLRTLQRELLLLVNLKRQSAH-TPLRALFDKHRVWQNRRGMMGEALNRLSQTQLRQ 300 

ILL L +EL LL++LK + A +PL +LF KHR+W R+ + AL RLS Q+ 
Sbjct: 14927 AMPILLWALFKELQLLLSLKSEQAQGSPLNSLFGKHRIWDKRKPLYQTALQRLSLAQIEH 15106 



Query: 301 AVQLLTRTELTLKQDYGQSVWAELEGLSLLL CHKPLADVFID 342 

+ ++ EL LKQ G WLLLL HLA + +D 

Sbjct: 15107 MLAFASKLELNLKQ-LGHEDWTGLSHLCLLFDPKAHSHLAHINLD 15238 



>gsp_343 

Length =6977 
Plus Strand HSPs : 
Score = 70 (24.6 bits), Expect = 6.5, P = 1.00 

Identities = 33/127 (25%), Positives = 57/127 (44%), Frame = +1 

Query: 19 AAYLLLGNDPLL- - LQESQDAVRQVAAAQGFEEHH TFSIDPNTDW-NAIFSLCQAM 71 

AA++L N + E QDA + + + Q +HH TFSID N DW + S + 
Sbjct: 466 AAHVLEDNGQQISGFIEVQDADKGQSSMQAMTDHHAAHGTFSIDVNGDWVYQLDSRRPDV 645 

Query: 72 SLFASRQTLLLLLPENGPNAAINEQLLTLTGLLHDDLLLIVRGNKLSKAQENAAWFTALA 131 

+ +TLL + + + +E +T+ G ++ +L + Q +A T A 

Sbjct: 646 QALKAGETLLETITVHSADGTPHEVNITIHGQNDGAVISGADTGQLVEDQNVSAASTLEA 825 

Query: 132 NRSVQVT 13 8 

+ + VT 
Sbjct: 826 HGQLTVT 846 



Parameters : 
B=5 

ctxfactor=6 . 00 
E=10 



Query 
Frame 
+ 0 



MatID Matrix name 
0 BLOSUM62 
Q=9,R=2 



As Used 

Lambda K 
0.322 0.135 
0.244 0.0300 



H 

0.398 
0.180 



Computed 

Lambda K H 
same same same 
n/a n/a n/a 



Query 
Frame 
+0 



MatID 
0 



Length 
343 



Eff. Length 
343 



E 
10. 



S W 
62 3 



T X 
13 22 
33 



E2 
0.067 
0.063 



S2 
37 
42 



Statistics : 



Database : /usr/local/db/s_putref aciens 
Title: /usr/ local /db/s_putref aciens 
Release date: unknown 

Posted date: 10:07 AM EST Dec 15, 1998 
Format: BLAST 

# of letters in database: 5,974,789 

# of sequences in database: 2430 

# of database sequences satisfying E: 2 
No. of states in DFA: 531 (57 KB) 
Total size of DFA: 90 KB (128 KB) 

Time to generate neighborhood: O.Olu 0.00s O.Olt Elapsed: 00:00:00 
No. of threads or processors used: 1 

Search cpu time: 4.46u 0.00s 4.46t Elapsed: 00:00:04 

Total cpu time: 4.49u 0.00s 4.49t Elapsed: 00:00:04 

Start: Wed Mar 17 09:22:40 1999 End: Wed Mar 17 09:22:44 1999 



Following those BLAST hits is the sequence of the contig containing the top 
hit. 



TBLASTN 2.0al9MP-WashU [14- Jul-1998 ] [Build linux-x86 18:51:45 30-Jul-1998] 
Reference: Gish, Warren (1994-1997). unpublished. 

Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. 
Lipman (1990). Basic local alignment search tool. J. Mol . Biol. 215:403-10. 

Notice: statistical significance is estimated under the assumption that the 
equivalent of one complete reading frame of the database codes for protein and 
that significant alignments will involve only coding reading frames. 

Query= e coli delta 
(343 letters) 



Database : /usr/local/db/v_cholerae 

694 sequences; 4,145,671 total letters. 
Searching 10 20 30 40. . . .50 60 70, 



.80. 



, 90 . . . . 100% done 



Sequences producing High-scoring Segment Pairs: 



Smallest 
Sum 

Reading High Probability 



Frame Score P(N) 



N 



asm937 
asm843 



+2 817 6.9e-82 1 
+3 68 0.9995 1 



>asm937 

Length = 6994 
Plus Strand HSPs : 



Score = 817 (287.6 bits), Expect = 6.9e-82, P = 6.9e-82 
Identities = 168/338 (49%), Positives - 232/338 (68%), Frame 



+2 



Query : 


2 


Sbjct: 


1166 


Query : 


62 


Sbjct: 


1346 


Query: 


122 


Sbjct: 


1526 


Query : 


182 


Sbjct: 


1706 


Query : 


242 


Sbjct: 


1886 



+R+Y E+L L++ L YL+ GN+PLLLQE++ A+ + A AQGF E H FS D 



DW 



NA++ CQA+SLF+SRQ + + + PE+G NA ++L L G LH D+LL+V G KL+KAQ 



ENAAWF LA + + V C TPE ++LP++V 



L L+ D A Q+L 



+EGNL AL 



AQ+LE+L+LL+PDG LTL R+E++++ HFTP+HW+DALL GK+ RA IL+QL LE S 



EP+IL+RT Q+EL L+ +++ 



L +LFD++RVWQNRR + AL RL 



L + 



Query: 301 AVQLLTRTELTLKQDYGQSVWAELEGLSLLLCHKPLADV 339 

V + LT+ EL K Y Q VW L+ LSL C+ P A+ + 
Sbjct: 2066 LVGILTQAELLAKTQYEQPVWPILQQLSLECCN-PQANL 2179 



>asm843 

Length = 26, 802 

Plus Strand HSPs : 

Score = 68 (23.9 bits), Expect =7.6, P = 1.00 

Identities = 22/63 (34%), Positives = 31/63 (49,%),. Frame ~. +3 

Query : 115 NKLSKAQENAAWFT- ALANRSVQVTCQTPEQAQLPRWVAARAKQLNLELDDAANQVLCYC 17 3 

N LS Q ++ T AL +QV Q PE A+ +WVA ++ EL +A + 
Sbjct: 15237 NVL S VYQ P S S LVLT PAL LMAL I QWKQAP ELAKS LQWVAVGGARVAAE L I H S ARALG I PA 15416 

Query: 17 4 YEG 17 6 
YEG 

Sbjct: 15417 YEG 15425 



Parameters : 
B=5 

ctxf actor=6 . 00 
E=10 



Query 
Frame 
+ 0 



MatID Matrix name 
0 BLOSUM62 
Q=9,R=2 



As Used 

Lambda K 
0.322 0.135 
0.244 0.0300 



H 

0.398 
0.180 



Computed 

Lambda K H 
same same same 
n/a n/a n/a 



Query 

Frame MatID Length Eff. Length E S W 

+0 0 343 343 10. 60 3 



T X 
13 22 
33 



E2 
0.067 
0.063 



S2 
37 
42 



Statistics : 

Database : /usr/ local /db/v_cholerae 
Title: /usr/ local /db/v_cholerae 
Release date: unknown 

Posted date: 12:58 PM EST Dec 11, 1998 
Format : BLAST 

# of letters in database: 4,145,671 

# of sequences in database: 694 

# of database sequences satisfying E: 2 
No. of states in DFA: 531 (57 KB) 
Total size of DFA:' 90 KB (128 KB) 

Time to generate neighborhood: O.Olu 0.00s O.Olt Elapsed: 00:00:00 
No. of threads or processors used: 1 

Search cpu time: 3.25u 0.02s 3.27t Elapsed: 00:00:03 

Total cpu time: 3.26u 0.03s 3.29t Elapsed: 00:00:03 

Start: Wed Mar 17 09:24:47 1999 End: Wed Mar 17 09:24:50 1999 



The top-scoring match came from this contig (up to lOOObp on either side of 
the hit are shown) : 



Following those BLAST hits is the sequence of the contig containing the top 
hit. 



TBLASTN 2 . 0al9MP-WashU [14-Jul-1998] [Build linux-x86 18:51:45 30-Jul-1998] 
Reference: Gish, Warren (1994-1997). unpublished. 

Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. 
Lipman (1990). Basic local alignment search tool. J. Mol . Biol. 215:403-10. 

Notice: statistical significance is estimated under the assumption that the 
equivalent of one complete reading frame of the database codes for protein and 
that significant alignments will involve only coding reading frames. 

Query= delta prime 

(334 letters) 

Database : /usr/local/db/v_cholerae 

694 sequences; 4,145,671 total letters. 
Searching 10 20 30 40 50 60 70 80 90 100% done 

Smallest 
Sum 





Reading 


High 


Probability 


Sequences producing High-scoring Segment Pairs: 


Frame 


Score 


P(N) N 


asm894 


-1 


394 


8.1e-37 1 


asm864 


-3 


178 


6.1e-12 1 


asm959 


+3 


79 


0.37 1 



>asm894 

Length = 19,711 

Minus Strand HSPs : 

Score = 394 (138.7 bits), Expect = 8.1e-37, P = 8.1e-37 
Identities = 106/313 (33%), Positives = 159/313 (50%), Frame = -1 

Query: 4 YPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCR 63 

YPWL P ++ A AG+ A LIQA G+G ++L+ ++R L+C Q + CG C 
Sbjct: 18034 YPWLVPWQPWQAGLAAGKISSATLIQASEGVGVESLVELMARTLMCTSSQS-EPCGFCH 17858 

Query: 64 GCQLMQAGTHPDYTTLAPEKGKNTLGVDAV11EVTEKLNEHARLGGAKVVWVTDAALLTDA 123 

C LMQ+G HPD++ + PEK ++ V+ +R++ E ++L G + + + + A + + + 

Sbjct: 17857 SCGLMQSGNHPDFHWKPEKIGKSITVEQIRQMNRIAQESSQLSGYRLIVIEPADAMNES 17678 

Query: 124 AANALLKTLEEPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWLSREVTMS 183 

+ANALLKTLEEP F L T + LL T+ SRC+ LP V WL + + + 

Sbjct: 17677 SANALLKTLEEPAPNCLFILVTSRIKHLLPTIVSRCQRLVLPAPTTALWEWLKGQ-GIT 17501 

Query: 184 QDALLAALRLSAGSPGAALA-LFQGDNWQARETLCQALAYSVPSGDWYSLLA--ALNHEQ 240 

A AL i L A SP A + +G + E Q + + SGD + L AL 
Sbjct: 17500 TPAY— ALHLCADSPLKTRAFMLEGGAEKYHELESQLM— NALSGDVNAQLKCIALIDAD 17333 



Query: 241 APARLHWLATLLMDALKRHHGAAQVTNVDVPGLVAELANHLSPSRLQAILGDVCHIREQL 300 
L+W+ +LDAKHGQ P ALA + S+L + + EQL 



Sbjct: 17332 LTTHLYWVWCVLTDAQKIHFGVQQDY YPPASAALAGRFTYSKLHVQTASLERLMEQL 17162 



Query: 301 MSVTGINRELLITDLL 316 

+G+N ELL+ L 
Sbjct: 17161 NQFSGLNTELLLLQWL 17114 



>asm864 

Length = 23,778 

Minus Strand HSPs : 

Score = 178 (62.7 bits), Expect = 6.1e-12, P = 6.1e-12 
Identities = 46/143 (32%), Positives = 68/143 (47%), Frame = -3 

Query: 22 RGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAP 81 

R HHA L G+G + ++ L C+ CG C CQ + G D + 

Sbjct: 14509 RLHHAYLFSGTRGVGKTTIGRLFAKGLNCETGITATPCGQCATCQEIDQGRFVDLLEI - - 14336 

Query: 82 EKGKNTLGVI)AVREVTEKLNEHARLGGAKVVWTDAALLTDAAANALLKTLEEPPAETWF 141 

+ T V+ RE+ + + G KV + + +L+ + NALLKTLEEPP F 

Sbjct: 14335 DAASRTK-VEDTRELLDNVQYKPARGRFKVYLIDEVHMLSRHSFNALLKTLEEPPEYVKF 14159 

Query: 142 F LATRE P ERLL ATLRS RC RLH YL 164 

LAT + P++L T+ SRC +L 
Sbjct: 14158 LLATTDPQKLPVTILSRCLQFHL 14090 

>asm959 

Length = 15,780 
Plus Strand HSPs: 
Score = 79 (27.8 bits), Expect = 0.47, P = 0.37 

Identities = 35/115 (30%), Positives = 52/115 (45%), Frame = +3 

TW-LSREVTMS QDALLAALRLSAGSP GAALALFQGDNWQARETLCQALAYSVPS 226 

+W LS V+ QD L AA L+ + G +AL G A + ++A S P+ 



G+W + + A + W+ATL D L R+ G + V E+ HL 



Query : 


174 


Sbjct: 


1047 


Query : 


227 


Sbjct: 


1215 


Query : 


286 


Sbjct: 


1389 



Parameters : 
B=5 

ctxfactor=6 . 00 
E=10 

Query 

Frame Mat ID Matrix. name Lambda 
+0 0 BLOSUM62 0.321 

Q=9,R=2 0.244 



As Used Computed 

K H Lambda K H 

0.136 0.423 same same same 

0.0300 0.180 n/a n/a n/a 



Query 



►LAST Search Results 



| Entrez 1 ? ] 



WARNING: These microbial genomes from are not yet finished, and are not 
yet in GenBank and are not presently distributed to EMBL or DDBJ . 
Please see details 



NOTE: 



This WWW-BLAST page utilizes NCBI ' s new gapped BLAST algorithm 
( Altschul et al. f 1 997 ) with the BLASTN, TBLASTN , and TBLASTX programs 



Commencing search, please wait for results. 



TBLASTN 2.0.8 [ Jan- 05 - 19 9 9 ] 



Reference s 

Altschul, Stephen F. , Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search 
programs", Nucleic Acids Res. 25:3389-3402. 

Query= ecoli .delta 

(343 letters) 

Searching done 

If you have any problems or questions with the results of this search 
please refer to the BLAST FAQs 



Score 


E 


(bits) 


Value 


139 


7e-34 


31 


0.45 


27 


5.1 


27 


5.1 


27 


5.1 



Sequences producing significant alignments: 

gnl|PAGP|Paeruginosa_Contig52 Pseudomonas aeruginosa unfinished 
gnl PAGP | Paeruginosa_Contig44 Pseudomonas aeruginosa unfinished 
gnl PAGP |Paeruginosa_Contig53 Pseudomonas aeruginosa unfinished 
gnl PAGP Paeruginosa_Contig4'9 Pseudomonas aeruginosa unfinished 
gnl|PAGP|Paeruginosa_Contig47 Pseudomonas aeruginosa unfinished 

gnl|PAGP|Paeruginosa_Contig52 Pseudomonas aeruginosa unfinished fragment of complete ger 
Score = 139 bits (347), Expect = 7e-34 

Identities = 106/329 (32%), Positives = 155/329 (46%), Gaps = 8/329 (2%) 
Frame - -2 

Query: 2 IRLYPEQLRAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFSIDPNTDW 61 

ck- - ++L P QL L L Y++ G+ + PLL Q E+ DA+RQ + F E F+ + N DW 

Sonet: 245226 MKLTPAQLAKHLQGPLAPVYWSGDEPLLCQEACDAIRQACRERDFGERQVFNAEANFDW 245047 

Query: 62 NAIFSLCQAMSLFASRQTLLLLLPENGPN AAINEQXXXXXXXXXXXXXXIVRGNKLS 118 

e , . + ++SLFA ++ + L LP P AAI ++ + KL 

Sbjct: 245046 GLLLEAGASLSLFAEKRLIELRLPSGKPGDKGAAILQEYLQRPPEDTVLLLGLP KLD 244876 

Query: 119 KAQENAAWFTAL-ANRSVQVTCQTPEQAQLPRWVAARAKQLNLELDDAANQVLCYCYEG 176 



s 



+ + WAL N++ + QLP+W+ R Q L A +++ EG 

Sbjct: 244875 GSTQKTKWAKALIDGNAAQFIQVWPVDVHQLPQWIRQRLSQAGLSASPEALELIAARVEG 244696 

Query: 177 NLLALAQALERLSLLWPDGKLTLPRVEQAYNDAAHFTPFHWVDALLMGKSKRALHILQQL 236 

NLLA AQ +E+L LL ++ V+ AV D+A F F +DA L G++ AL IL+ L 
Sbjct: 244695 NLLAAAQEIEKLKLLAEGNQIDAATVQAAVADSARFDVFGLIDAALGGEAAHALRILEGL 244516 

Query: 237 RLEGSE-PVIXXXXXXXXXXXXXXXXXQSAHTPLRALFDKHR--VWQNRRGMMGEALNRL 293 

ow R EG E PVI PL F + R VW RR ++ AL R 

Sbjct: 244515 RGEGIEPPVILWGLAREIRLLAGLSQQYGQGIPLEKAFAQARPPVWDKRRPLLTRALQRH 244336 

Query: 294 SQTQLRQAVQLLTRTELTLKQDYGQSVWAELEGLSLL 330 

S ++ Q+L +L Q GQ+ + GLSLL 

Sbjct: 244335 S SSRWN QMLRDAQL I DAQ 1 KGQAPGS PWSGLSLL 244234 ■ 

Score =29.0 bits (63), Expect =1.3 

Identities = 20/50 (40%), Positives = 28/50 (56%) 

Frame = +2 

Query: 10 RAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFSIDPNT 59 

RA+L +GL LLL + +Q S+ AVR++AA G T DP+T 

Sbjct: 87335 RARLAQGLSLTDLLLEH AIQPSRSAVRRLAAGGGLRLDGTPVSDPDT 87475 

gnl|PAGP|Paeruginosa_Contig44 Pseudomonas aeruginosa unfinished fragment of complete ger 

Score =30.5 bits (67), Expect =0.45 

Identities = 19/54 (35%), Positives = 25/54 (46%) 

Frame = +3 

Query: 274 DKHRVWQNRRGMMGEALNRLSQTQLRQAVQLLTRTELTLKQDYGQSVWAELEGL 327 

D + Q R +G L +L QTQ V LL ++ y V+A LEGL 

Sbjct: 157899 DGEAIAQLRTDELGGLLRKLRQTQQMALVGLLRNQDVATSLGYLARVYARLEGL 158060 

gnl|PAGP|Paeruginosa_Contig53 Pseudomonas aeruginosa unfinished fragment of complete ger 
Score =27.0 bits (58), Expect =5.1 

FrSe 1 - 1 ^ = 18/53 (33%> ' P ° sitives = 33/53 (61% >' Ga P s = 4/53 (7%) 

Query: 156 KQLNLELDDAANQVLC-YCYEGNLLALAQALERLSLLWPDGKL TLPRVEQAVND 208 

_.. K+ ++ + AA LC + + GN+ LA +ERL+++ P G + LP+ + V+D 

Sbjct: 462347 KRGSIRFNSAAIMSLCRHDWPGNVRELANLVERLAIMHPYGVIGVGELPKKFRHVDD 462517 

gnl|PAGP|Paeruginosa_Contig49 Pseudomonas aeruginosa unfinished fragment of complete ger 

Length = 476032 
Score =27.0 bits (58), Expect =5.1 

Fra^e 1 ^ = 14/37 (3?%) ' Positives = 24/37 < 64 %>' Gaps = 7/37 (18%) 

Query: 124 AAWFTALANRSVQVTCQT P EQAQLPRWVAARAKQLNL 160 

... , - cnt A+AN V + +T E + +LPRW++ R +++L 

Sbjct: 2694 AALMLAMANMRVLLAARTKRPSLPAFEEVRLPRWLSGRTMKISL 2563 




Entrez 1 ? ] 



WARNING: These microbial genomes from are not yet finished, and are not 
yet in GenBank and are not presently distributed to EMBL or DDBJ. 
Please see details 



NOTE: 



This WWW-BLAST page utilizes NCBI ' s new gapped BLAST algorithm 
(Altschul et al. f 1997) with the BLASTN, TBLASTN, and TBLASTX programs 



Commencing search, please wait for results. 



TBLASTN 2.0.8 [Jan- 05- 19 99 ] 



Reference : 

Altschul, Stephen F. , Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search 
programs", Nucleic Acids Res. 25:3389-3402. 

Query= deltaprime . ecoli 
(334 letters) 



Searching. 



. done 



If you have any problems or questions with the results of this search 
please refer to the BLAST FAQs 



Sequences producing significant alignments: 



gnl | PAGP 
gnl | PAGP 
gnl j PAGP 
gnl | PAGP 
gnl | PAGP 
gnl PAGP 



Paeruginosa_ 
Paeruginosa_ 
Paeruginosa. 
Paeruginosa. 
Paeruginosa_ 
Paeruginosa, 



.ContigSO 
_Contig53 
Contig47 
.Contig45 
.Contig46 
.Contig52 



Pseudomonas 
Pseudomonas 
Pseudomonas 
Pseudomonas 
Pseudomonas 
Pseudomonas 



aeruginosa 
aeruginosa 
aeruginosa 
aeruginosa 
aeruginosa 
aeruginosa 



unfinished 
unfinished 
unfinished 
unfinished 
unfinished 
unfinished 



Score 


E 


(bits) 


Value 


115 


9e-27 


62 


le-10 


29 


1.00 


29 


1.00 


.. _2i 


1.3 


28 


2.9 



i = — • — fc=uu U . UU uaa aeiuginosa unrimsnea rragment 

Length = 798876 

Score = 115 bits (286), Expect = 9e-27 

Identities = 84/323 (26%), Positives = 139/323 (43%), Gaps = 11/323 (3%) 
Frame = +2 



Query: 4 YPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCR 63 

. YPW + + +L Q HA L+ G+G AL + LLCQ+P +CG C+ 

Sbnct: 521618 YPWQQALWSQLGGRAQHA HAYLLYGPAGIGKRALAEHWAAQLLCQRPAAAGACGECK 521788 

Query: 64 GCQLMQAGTHPDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKVVWVXXXXXXXXX 123 

. CQL+ AGTHPDY+ L PE+ + + VD VR++ + + A+LGG KW + 

Sbjct: 521789 ACQLLAAGTHPDYFVLEPEEAEKPIRVDQVRDLVGFWQTAQLGGRKWLLEPAEAMNVN 521968 



Query: 124 XXXXXXXXXEEPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWLSREVTMS 183 

EEP +T L + +P RLL T++SRC P ++ W L+R + 

Sbjct: 521969 AANALLKSLEEPSGDTVLLLISHQPSRLLPTIKSRCVQQACPLPGAAASLEWLARALPDE 522148 

Query: 184 QDXXXXXXXXXXXXXXXXXXXFQGDNWQ ARETLCQALAYSVPSGDWYSLLA 234 

Sbjct: 522149 PAEALEELLALSGGSPLTAQRLHGQGVREQRAQWEGVKKLLKQQIAASPLAESW 522313 

Query: 235 ALNHEQAPARLHWLATLLMDALKRH - - HGAAQVTNVDVPGLVAELANHLS PSRLQAI LGD 292 

N P w + L+ H + D+ ++ L + +++ A+ 

Sbnct: 522314 -NSVPLPLLFDWFCDWTLGILRYQLTHDEEGLGLADMRKVIQYLGDKSGQAKVLAMQDW 522487 

Query: 293 VCHIREQLMSVTGINRELLITDLLLRIEHYLQPG 326 " " 

+ R+++++ +NR LL+ LL++ PG 
Sbjct: 522488 LLQQRQKVLNKANLNRVLLLEALLVQWASLPGPG 522589 

Score =30.1 bits (66), Expect =0.58 

Identities = 17/36 (47%), Positives = 22/36 (60%) 

Frame = +2 

Query: 13 KLVASYQAGRGHHALLIQALPGMGDDALIYALSRYL 48 

+L + RGH LLI+ LPGMG L +AL+R L 
Sb 3 ct: 613469 RLALACLLARGH- -LLIEDLPGMGKTTLSHALARVL 613570 

Score =28.2 bits (61), Expect =2.2 

Identities = 18/69 (26%), Positives = 28/69 (40%) 

Frame = +1 

Query: 14 LVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTH 73 

AG H + L +GD + + QP H G CRG + + 

Sbjct: 670210 LLAALLAGYLAHLFCRRRLSLVGD^RAMRAREFHMVYQPIIHLDTGECRGVEA 670389 

Query: 74 PDYYTLAPE 82 

PD + p+ 
Sbjct: 670390 PDRSQVRPD 670416 

Score =26.2 bits (56), Expect =8.6 

Identities = 22/72 (30%), Positives = 32/72 (43%) 

Frame = +2 

Query: 258 RHHGAAQVTNVDVPGLVAEL^LSPSRLQAILGDVCHIREQLMSVTGINRELLITDLLL 317 
„, . KHHG + LV L +HL P ++ G V H E+ ++R t t j. 

Sb 3 ct: 795185 RHHGEEAWGMAHGALVDVLGHHLHPDLH^ 795358 

Query: 318 RIEHYLQPGWL 329 

R H++ GV L 
Sbjct: 795359 RQGHHVASGVAL 795394 

g nl| PM P| P «ru | i„o r .Co„ti ? 53 Ps eudc„„„ 8 s aeru gi „o sa u„ £inished fragment o£ complete g „ 
Score =62.1 bits (148), Expect = le-10 

£S "2 = 69/268 <25%) ' P ° sitives " 103/268 (37%), Gaps = 12/268 (4%, 

Query: 14 LVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTH 73 

^+ R HHA L G+G + L++ L C+ CG C C+ + G 




WARNING: These microbial genomes from are not yet finished, and are not 
yet in GenBank and are not presently distributed to EMBL or DDBJ. 
Please see details 



NOTE i 



This WWW-BLAST page utilizes NCBI • s new gapped BLAST algorithm 
( Altschul et al., 1997 ) with the BLASTN, TBLASTN, and TBLASTX programs 



Commencing search, please wait for results. 



TBLASTN 2.0.8 [ Jan- 0 5 - 199 9 ] 



Reference ; 

Altschul, Stephen F. , Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997) 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search 
programs", Nucleic Acids Res. 25:3389-3402. 

Query= ecoli. delta 

(343 letters) 

Searching , 

3 done 

If you have any problems or questions with the results of this search 
please refer to the BLAST FAQ a 



Sequences producing significant alignments: (bits) Value 

gnl j Sanger |S.typhiContigl564 Salmonella typhi unfinished fragmen. 563 e-161 

gnl Sanger S . typhiContigl088 Salmonella typhi unfinished fragmen. H" 2 0 

gnl Sanger S . typhiConti g l954 . 0 Salmonella typhi unfinished fragm. 1? 2 0 

gnl | Sanger | S . typhiContig2054 Salmonella typhi unfinished fragmen. . . H? 6 .' 0 

gnl|Sanger|S e typhiContigl564 Salmonella typhi unfinished fragment of complete genome 

Score = 563 bits (1435), Expect = e-161 

Identities = 279/343 (81%), Positives = 298/343 (86%) 

Frame = +3 

Query: 1 MIRLYPEQLRAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFSIDPNTD 60 
C k- . 1C «„ MIRLYPEQLRAQLNE LRAAYLLLGNDPLLLQESQDA+R AA+QGFEEHH F++DP+TD 
Sbuct: 1500 MIRLYPEQLRAQLNEWLRAAYLLLGNDPLLLQESQDAIRLAAASQGFEEHHAFTLDPSTD 1679 

Query: 61 WNAIFSLCQAMSLFASRQTLLLLLPENGPNAAINEQXXXXXXXXXXXXXXIVRGNKLSKA 120 

W ++FSLCQAMSLFASRQTL+L LPENGPNAA+NEQ IVRGNKL+KA 
Sb 3 ct: 1680 WGSLFSLCQAMSLFASRQTLVLQLPENGPNAAMNEQI^TLSELLHDDLLLIVRGNKLTKA 1859 

Query: 121 QENAAWFTAI^SVQVTCQTPEQAQLPRWVAARAKQLNLELDDAANQVLCYCYEGNLLA 180 
QENAAW+TALA+RSVQV+CQTPEQAQLPRWVAARAK NL+LDDAANQ+LCYCYEGNLLA 



Sbjct: 1860 QENAAWYTALADRSVQVSCQTPEQAQLPRWVAARAKAQNLQLDDAANQLLCYCYEGNLLA 2039 

Query: 181 LAQALERLSLLWPIX3KLTLPRVEQAVNDAAHFTPFHWVDALLMGKSKRALHILQQLRLEG 240 

LAQALERLSLLWPIX3KLTLPRVEQAVNDAAHFTPFHWVDALLMGKSKRALHILQQLRLEG 
Sbjct: 2040 LAQALERLSLLWPIXSKLTLPRVEQAVND^ 2219 

Query: 241 S E P VI XXXXXXXXXXXXXXXXXQ S AHT P LRALFDKHRVWQNRRGMMGEALNRL S QTQLRQ 300 

SEPVI QSAHTPLRALFDKHRVWQNRR M+G+AL RL QLRQ 

Sbjct: 2220 SEPVILLRTLQRELLLLVNLKRQSAHTPLRALFDKHRVWQNRRPMIGDALQRLHPAQLRQ 2399 

Query: 301 AVQLLTRTELTLKQDYGQSVWAELEGLSLLLCHKPLADVFIDG 343 

AVQLLTRTE+TLKQDYGQSVWA+LEGLSLLLCHK LAD VF IDG 
Sbjct: 2400 AVQLLTRTEITLKQDYGQSVWADLEGLSLLLCHKAIADVFIDG 2528 ' 

gnl | Sanger | S . typhiContigl088 Salmonella typhi unfinished fragment of complete genome 
Length = 2112 

Score =27.8 bits (60), Expect =2.0 

Identities = 14/38 (36%), Positives = 21/38 (54%) 

Frame = -1 

Query: 270 RALFDKHRVWQNRRGMMGEALNRLSQTQLRQAVQLLTR 307 

R LF +HR + RRG G+ + Q +LR + +TR 
Sbjct: 963 RKLFQRHRPLRQRRGRRGKDHQLIFQPRLRDNLCTVTR 850 

gnl | Sanger | S . typhiContigl954 . 0 Salmonella typhi unfinished fragment of complete genome 
Length = 3497 

, Score =27.8 bits (60), Expect =2.0 
Identities = 14/36 (38%), Positives = 23/36 (63%) 
Frame = +3 

Query: 54 SIDPNTDWNAIFSLCQAMSLFASRQTLLLLLPENGP 89 

+++P T W+ S QAMS FA +++ +LLP + P 
Sbjct: 1464 TVNPVTPWSP*ISRYQAMSAFARQKS--VLLPSSSP 1565 

gnl|Sanger|S.typhiContig2054 Salmonella typhi unfinished fragment of complete genome 
Length = 6017 

Score = 26.2 bits (56), Expect = 6.0 

Identities = 18/47 (38%), Positives = 28/47 (59%), Gaps = 12/47 (25%) 
Frame = -1 

Query: 263 QSAHTPLRALFDKHRV WQNRRG MMGEALNRLSQTQLRQAVQLLTRTE 309 

+ S +T LRAL+DKH V NRG M A + + + ++ V + +TE 

Sbjct: 5450 RS I YTDLRAL YDKHNVAGI TASQTNREGGASEVATMMHAADNI EKVRI ADLVI T INKTE 5274 

CPU time: 0.05 user sees. 0.01 sys . sees 0.06 total sees. 

Database: Unfinished Salmonella typhi 
Posted date: Dec 15, 1998 12:07 PM 
Number of letters in database: 4,464,430 
Number of sequences in database: 1746 



Lambda K H 

0.321 0.134 0.00 




WARNING: These microbial genomes from are not yet finished, and are not 
yet in GenBank and are not presently distributed to EMBL or DDBJ. 
Please see details 



NOTE: This WWW-BLAST page utilizes NCBI ' s new gapped BLAST algorithm 

( Altschul et al.. 1997 ) with the BLASTN, TBLASTN, and TBLASTX programs. 



Commencing search, please wait for results. 



TBLASTN 2.0.8 [ Jan- 05 - 19 9 9 ] 



Reference ; 

Altschul, Stephen F., Thomas L . Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search 
programs " , Nucleic Acids Res. 25:3389-3402. 



Query= deltaprime . ecoli 
(334 letters) 

Searching done 

If you have any problems or questions with the results of this search 
please refer to the BLAST FAQs 



Sequences producing significant alignments: 



gnl | Sanger | Y 
gnl j Sanger | Y 
gnl | Sanger j Y 
gnl j Sanger j Y 
gnl | Sanger | Y 
gnl | Sanger Y 



pesits_Contig51 Yersinia pestis unfinished fragment. 



pesits_Contig774 Yersinia pestis unfinished fragmen. 

pesits_Contig695 Yersinia pestis unfinished fragmen. 

pesits_Contig675 Yersinia pestis unfinished fragmen. 

pesits_Contig777 Yersinia pestis unfinished fragmen. 

pesits_Contig701 Yersinia pestis unfinished fragmen. 

gnl | Sanger | Y.pesits_Contig51 Yersinia pestis unfinished fragment of complete genome 
Length = 20197 



Score 


E 


(bits) 


Value 


284 


9e-78 


63 


6e-ll 


28 


1.8 


28 


2.3 


27 


3.0 


26 


6.8 



Score = 284 bits (720), Expect = 9e-78 

Identities = 147/334 (44%), Positives = 192/334 (57%), Gaps = 6/334 (1%) 
Frame = -1 



Query: 1 MRWYPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCG 60 

M WYPWL + +LV + GRGHHALL+ +LPG G+DALIYALSR+L+CQQ QG KSCG 
Sbjct: 15274 MNWYPWLNAPYRQLVGQHSTGRGHHALLLHSLPGNGEDALIYALSRWLMCQQRQGEKSCG 15095 

Query: 61 HCRGCQLMQAGTHPDYYTLAPEKGKNTLGVDAVREV^ 120 

C C+LM AG HPD+Y L PEKGK+++GV+ VR++ +KL HA+ GGAKWW+ 
Sbjct: 15094 ECHSCRLMLAGNHPDWYVLTPEKGKSSIGVELVRQLIDKLYSHAQQGGAKWWLPHAEVL 14915 



Query: 121 XXXXXXXXXXXXEEPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWLS 177 

EEPP +T+F L +P LLATLRSRC YLA P + WL+ 

Sbjct: 14914 TDAAANALLKTLEEPPEKTYFLLDCHQPASLLATLRSRCFYWYLACPDTAICLQWLNLQW 14735 

Query: 178 --REVTMSQDXXXXXXXXXXXXXXXXXXXFQGDNWQARETLCQALAYSVPSGDWYSLLAA 235 

R++ + Q + WRLCL++D SLL 

Sbjct: 14734 RKRQIPVEPVAMLAALKLSEGAPLAAERLLQPERWSIRSALCSGLREALNRSDLLSLLPQ 14555 

Query: 236 LNHEQAPARLHWI^TLLMDALKRH 294 

LNH+ A RL WL++LL+DALK GA + N D LV +LA+ + L + + 
Sbjct: 14554 LNHDDAAERLQWLSSLLLDALKWQQGAGEFAVNQDQLPLVQQLAHIAATPVLLQLAKQLA 14375 

Query: 295 HIREQLMSVTGINRELLITDLLLRIEHYLQPGWLPVPHL 334 

H R QL+SV G+NRELL+T+ LL E L G +P L 
Sbjct: 14374 HCRHQLLSWGVNRELLLTEQLLSWETALSTGTYSTLPSL 14255 

gnl | Sanger | Y.pesits_Contig774 Yersinia pestis unfinished fragment of complete genome 
Length = 66020 

Score =62.8 bits (150), Expect = 6e-ll 

Identities = 40/144 (27%), Positives = 60/144 (40%) 

Frame = +2 

Query: 21 GRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLA 80 

GR HHA L G+G ++ L++ L C+ CG C CQ ++ G D + 

Sbjct: 2714 GRIHHAYLFSGTRGVGKTSIARLLAKGLNCETGITATPCGTCANCQEIEQGRFVDLIEI- 2890 

Query: 81 PEKGKNTLGVDAVKEVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXXXXXEEPPAETW 140 

+ V+ RE+ + + G KV + EEPPA 

Sbjct: 2891 --DAASRTKVEDTRELLDNVQYAPARGRFKVYLIDEVHMLSRHSFNALLKTLEEPPAHVK 3064 

Query: 141 FFLATREPERLLATLRSRCRLHYL 164 

F LAT +P++L T+ SRC +L 
Sbjct: 3065 FLLATTDPQKLPVTILSRCLQFHL 3136 

gnl | Sanger | Y . pesits_Contig695 Yersinia pestis unfinished fragment of complete genome 
Length = 43655 

Score =28.2 bits (61), Expect =1.8 

Identities = 8/13 (61%), Positives = 11/13 (84%) 

Frame = +3 

Query: 54 QGHKSCGHCRGCQ 66 

+GH +CGHCR C+ 
Sbjct: 9102 EGH I TCGHCRNCR 9140 

gnl|Sanger|Y.pesits_Contig675 Yersinia pestis unfinished fragment of complete genome 
Length = -.1090 

Score =27.8 bits (60), Expect =2.3 

Identities = 15/41 (36%), Positives = 21/41 (50%) 

Frame = -2 



Query: 213 RETLCQALAYSVPSGDWYSLLAALNHEQAPARLHWLATLLM 253 

+E+ C + Y S YS+L+A H P RL W +LM 
Sbjct: 786 QESECLSCYYQDQSYLHYSILSACLHHWI PDRLRWPEYMLM 664 




Entr ez | ? | 

WARNING: These microbial genomes from are not yet finished, and are not 
yet in GenBank and are not presently distributed to EMBL or DDBJ. 
Please see details 



NOTE: This WWW-BLAST page utilizes NCBI ' s new gapped BLAST algorithm 

(Altschul et al., 1997) with the BLASTN, TBLASTN , and TBLASTX programs. 

Commencing search, please wait for results. 



TBLASTN 2.0.8 [ Jan-05- 1999 ] 



Reference ; 

Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search 
programs", Nucleic Acids Res. 25:33 89-3402. 

Query= ecoli . delta 

(343 letters) 

Searching done 

If you have any problems or questions with the results of this search 
please refer to the BLAST FAQs 



Sequences producing significant alignments: 



gnl | Sanger | Y . 
gnl | Sanger | Y, 
gnl (Sanger j Y. 
gnl j Sanger | Y. 
gnl j Sanger | Y. 
gnl | Sanger | Y . 



pesits_Contig803 Yersinia pestis unfinished f ragmen. 
pesits_Contig689 Yersinia pestis unfinished f ragmen, 
pesits_Contig701 Yersinia pestis unfinished f ragmen. 
pesits_Contig798 Yersinia pestis unfinished f ragmen. 
pesits_Contig795.0 Yersinia pestis unfinished fragm. 
pesits_Contig765 Yersinia pestis unfinished fragmen. 



Score 


E 


(bits) 


Value 


447 


e-127 


27 


3.1 


27 


3.1 


27 


5.3 


26 


6.9 


.2$ 


6.9 



gnl|Sanger|Y.pesits_Contig803 Yersinia pestis unfinished fragment of complete genome 
Length = 177561 

Score = 447 bits (1138), Expect = e-127 

Identities = 223/342 (65%), Positives = 263/342 (76%) 

Frame = +1 



Query : 1 



Sbjct ; 



Query: 61 



Sbjct; 



MIRLYPEQLRAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFSIDPNTD 6 0 
MIR+YPEQL AQL+EGLRA YLL GN+PLLLQESQD +R+VA+ F EH +F++D +T+ 

50068 MIRI YPEQLVAQLHEGLRACYLLCGNEPLLLQESQDHIRRVASQHDFTEHFSFALDAHTE 50247 

WNAIFSLCQAMSLFASRQTLLLLLPENGPNAAINEQXXXXXXXXXXXXXXIVRGNKLSKA 120 
W IFSLCQA+SLFASRQTLLL P++G A I+EQ i+r NKL+KA 

50248 WEHIFSLCQALSLFASRQTLLLSFPDSGLTAPISEQLVKLSGLLHPDILLILRANKLTKA 50427 



y 1H1 LAQALERLSLLWPDGKLTr nmrew UU «A IQLLC YC YEGNLLA 50607 

^ 241 SEPVIXXXXXXXXXXXXXXXJfYo FTp VHWLDALLMGKSKRAWHILQQLQQED 50787 

Query; 301 Avoir™. ^ LF ^ HKIW Q^P^QALo£ f r r£ L+Q 

AVqlltrteltlkqdygq^ Q^QRLSLQQLQQ 50967 

^l|Sanger,Y.p esits _ Cont 

^ngth - 32290* unf inished . 

~ ° f -nome 

Identities = 23/7P Expect =3.1 

- sSSSsww 

;r '" ■ww— r^""™—™— ..... 

Srnl|Sanger|Y. pesl) . c _ 

L5E£V$£f M u„ £inishea £ra 

Identities = ig/q fl Ex Pect = 3 1 

«/5. <3 J%) , 27/5s . (45%j 

^|Sang er)Y . pesits . ^-IVWPAAKIlx 365l2 

Length , 1121 J g 98 Ye " lnia pesfcis ^ 

Score = ,6 fi k- f «ginent of complete a*n 

™ T- 6-5 b its f57\ c y ece genome 

Identities = n/ii E *Pect = 53 

r»„ 8 . +1 13,33 (Mt) , Positiv = s 3^ ^ 

Query. 28 G ° PS = 1/33 ,3*, 

9111 ' s «nffer | y. DeaJl . a „ 

1 r pesits -Contig795 n v= • . 
^ngth = 7l341 g -° Y ^ Sin ia pestis unfi nished . 

Score = 26 ? k- fragment of complete 

26 -2 bx ts (56 , _ om Plete genome 
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ABSTRACT 

The entire genome of the bacterium Mycoplasma 
pneumoniae M129 has been sequenced. It has a size of 
816 394 base pairs with an average G+C content of 40.0 
mol%. We predict 677 open reading frames (ORFs) and 
39 genes coding for various RNA species. Of the 
predicted ORFs, 75.9% showed significant similarity to 
genes/proteins of other organisms while only 9.9% did 
not reveal any significant similarity to gene sequences 
in databases. This permitted us tentatively to assign a 
functional classification to a large number of ORFs and 
to deduce the biochemical and physiological properties 
of this bacterium. The reduction of the genome size of 
M. pneumoniae during its reductive evolution from 
ancestral bacteria can be explained by the loss of 
complete anabolic (e.g. no amino acid synthesis) and 
metabolic pathways. Therefore, M.pneumoniae de- 
pends in nature on an obligate parasitic lifestyle which 
requires the provision of exogenous essential metabo- 
lites. AH the major classes of cellular processes and 
metabolic pathways are briefly described. For a number 
of activities/functions present in M.pneumoniae accord- 
ing to experimental evidence, the corresponding genes 
could not be identified by similarity search. For instance 
we failed to identify genes/proteins involved in motility, 
chemotaxis and management of oxidative stress. 

INTRODUCTION 

The bacterium Mycoplasma pneumoniae has a genome size of 
-800 kb and completely lacks a cell wall. The bacterium is 
surrounded by a cytoplasmic membrane only, which contains 
cholesterol as an indispensable component. Mycoplasma pneumo- 
niae is a human pathogen, causing 'atypical pneumonia' (1) 
usually in older children and young adults. As a surface parasite, 
it attaches to the host's respiratory epithelium by means of a 
differentiated terminal structure termed attachment organelle or tip 
structure. For a long time, research activities mainly focused on 
pathogenicity-related topics such as studies on cytadherence (2), 
vaccination and diagnosis (3). Mycoplasma pneumoniae was not 
considered as an organism suitable for basic studies partly 
because of its fastidious growth requirements and partly because 



DDBJ/EMBL/GenBank accession no. U00089 



of the lack of established standard genetic tools like conjugation 
or transformation with self-replicating vectors (4). These disad- 
vantages can be compensated now to a large extent by the 
methods of molecular biology. 

Morowitz pointed out in 1984, that mycoplasmas would be 
suitable candidates for defining the genetic constitution of a ! T 
minimal self-replicating cell (5). The advantage of these bacteria 
for such studies (6,7), mainly due to their small genome size, was \ 
so obvious that several initiatives were started to sequence five 
different mycoplasma genomes: Mycoplasma genitalium (8,9), ■ 
M.pneumoniae (10), Mycoplasma capricolum (11), Mycoplasma I 
mycoides (12) and a species from the related genus Ureaplasnta, 
Ureaplasma urealyticum ( 1 3). So far, only the complete sequence 
of the M. genitalium genome has been published (9) which, with 
580 070 bp, is the smallest bacterial genome known so far. In the 
genus Mycoplasma, M.pneumoniae and M.genitalium are the 
closest related species. We report in this publication the complete 
nucleotide sequence of the genome of M.pneumoniae, which thus] 
provides information on a second small bacterial genome. All. 
M.pneumoniae genes which had been already sequenced were 
reanalyzed except for the PI operon (14). Our sequencing 1 
strategy, early results and a detailed description of M.pneumoniae] 
as an experimental system have been recently published (10). ^| 



'i 



MATERIALS AND METHODS 

Mycoplasma strain 

The strain Mycoplasma pneumoniae M129 (ATTC 29342) in the. 
18th broth passage was used to construct an ordered cosmid library, 
containing the complete genome (15). This cosmid library was the 
basis for the DNA sequence analysis. We selected this specify 
bacterial strain because it has been used in cytadherence and. 
pathogenicity studies (2, 1 6, 17). The strain in the 20th broth passage^ 
was still infectious in hamsters (H. Brunner, unpublished data). 



DNA sequencing 

Using the enzymatic dideoxy chain-termination method ( 1 8), d* 
sequence data for this study were exclusively generated onj 
fluorescenfbased sequence-gel reader (Model 373A, Appl$ 
Biosystems). Sequencing strategies and methods were as * 
scribed in Hilbert et al. (10). 



*To whom correspondence should be addressed. Tel: +49 6221 54 68 27; Fax: +49 6221 54 58 93; Email: r.herrmann@mail.zmbh.uni-heidelberg.de 
Present addresses: +QI AGEN GmbH, 40724 Hilden, Germany and § Cancer Research Center (DKFZ), 69 1 20 Heidelberg. Germany 
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Computer assisted analysis 

^Sequence assembly, map drawing and multiple alignments were 
* done with the Lasergene program package (DNA STAR). 
:"■ Other analyses were performed with the HUSAR (Heidelberg 
1 Unix Sequence Analysis Resources) program package release 4.0 
at the German Cancer Research Center, Heidelberg, Germany. 
This package is based on the GCG program package version 
Unix-8.1 of the Genetics Computer Group, Wisconsin. For 
searching the DNA and protein databases [S WISS-PROT (19) and 
PIR (20)] the FASTA (21) and BLAST (22) programs (BLASTX, 
BLASTNand BLASTP) were used. Conserved motifs in proteins 
and peptides were identified by using the program PROSITE 
(23). Open reading frames (ORFs) were calculated by the 
program FRAMES allowing AUG (or GUG, UUG) as start 
codons using the Mycoplasma translation table where UGA 
codes for tryptophan (24). The G+C content was calculated by the 
program WINDOW. Codon usage was performed with the 
program CODONFREQUENCY. 

The programs TopPred Ll.l (Manuel G. Carlos, Ecole 
Normale Superieure, Laboratoire de Genetique Moleculaire, 
Paris, France) and PSORT (25) (http://psort.nibb.ac.jp/) were 
used for the prediction of transmembrane domains and the 
. membrane topology of proteins. 

Each ORF analysis is accessible as a File Maker Pro (Claris) 
database which can be accessed at our world wide web (www) 
site (http://zmbh.uni-heidelberg.de/M_pneumoniae). It contains, 
besides genome and cosmid position of each ORF/gene, data 
about expression, availibility of antibodies, comments, literature, 
prosite patterns, amino acid composition and database search 
homology scores. All the annotations in this paper were done on 
the basis of the highest score values. 

Accession number 

The complete M.pneumoniae sequence has been annotated in 
GenBank (NCBI) with the accession number U00089. 

RESULTS AND DISCUSSION 

The strategy and methodology for sequencing the complete 
genome has been described by us recently ( 1 0). A total of 2 415 202 
nucleotides primary sequence data were provided by 6385 
sequencing reactions. Each strand of the genome was completely 
sequenced at least once. The direct sequencing approach, 
combining primer walking with a limited shotgun strategy based 
on a complete cosmid and plasmid library considerably facilitated 
the assembly of the individual sequences to the entire genome 
sequence. The average redundancy of the sequencing was 2.95 
(calculated for both strands). This very low redundancy was 
achieved by the use of 5095 oligonucleotides. 

The complete M.pneumoniae genome has a size of 816 394 bp 
and a G+C content of 40.0 mol%. Altogether 677 open reading 
frames (ORFs) and 39 genes coding for various RNA species 
w ere predicted. All ORFs were sorted into categories according 
to their proposed functions (Tables 1 and 2; Fig. 1). Only 333 



ORFs (49.2%) were functionally assigned, based on significant 
sequence similarities to genes or proteins from other organisms 
with known functions (e.g. ribosomal proteins) or at least known 
categories of function (e.g. proteins involved in cytadherence). 
Significant similarities to proteins without known function from 
other bacteria, mostly M.genitalium, were shown for 181 
proposed ORFs (26.7%). We also included in this group those 
M.pneumoniae proteins which were identified in protein extracts 
of M.pneumoniae by monospecific antibodies or by the N-terminal 
amino acid sequences of enriched proteins (26,27). The group of 
ORFs without significant similarity or without indication for their 
in vivo expression comprised 109 members (16.1%); 42 of them 
carry characteristic motifs, which are not sufficient for defining 
a function. Examples of such motifs are the leucine zipper (29 
cases; refered to all predicted ORFs), the typical prokaryotic 
lipoprotein sequence pattern (46 cases) or ATP- and GTP-binding 
sites (73 cases). In addition all predicted gene products were 
analyzed by programs for structure predictions, e.g. coiled/coiled 
structures (29 cases) or transmembrane segments (275 cases). 
The latter result supports the analysis of cell fractionation 
experiments which indicate that the membrane fraction contains 
-50% of the total proteins estimated by SDS-PAGE. About 8% 
of the genome is composed of repetitive DNA elements RepMP 1 , 
RepMP2/3, RepMP4 and RepMP5, while only 67 of all predicted 
ORFs (9.9%) code for a product without any similarity to a 
known RNA or protein. 

Finally, 58 gene families were defined comprising 298 proteins 
with at least two but frequently with more paralogs; these are 
proteins with similarities within the same species (see www pages). 

The proposed ORFs are not equally distributed over the 
genome. A lower coding density coincides with regions of lower 
or higher G+C content than the average. There are regions with 
a G+C content of up to 56 mol%. These regions code almost 
exclusively for the gene PI and gene ORF6 of the PI operpn, the 
repetitive DNA sequences RepMP4, RepMP2/3, RepMP5 and 
tRNAs (for details see www pages). 

The PI protein, the main adhesin, is essential for adherence of 
M.pneumoniae to its host cell (28) and the ORF6 gene product 
which is only found as a cleavage product, namely a 40 and 90 kDa 
protein, instead of the expected 1 30 kDa protein, is involved in an 
as yet unknown manner in cytadherence (14). Gene PI contains 
a copy each of RepMP2/3 and RepMP4 and gene ORF6 one of 
RepMP5 (29). In addition, several copies of each of these 
repetitive DNA sequences can easily be recognized by their relative 
high G+C content (Fig. 2). 

At the other extreme is the proposed origin of replication 
around nucleotide position 205 000 (pcosMPK05, dnaA region), 
with a G+C content of only 26 moI% (10). 

Other regions with a low G+C content do not show a similar 
obvious coding pattern, but proposed ORFs coding for lipoproteins 
or the hsd modification/restriction system are frequently located 
in these regions. 

The total length of all coding regions is 724 174 bp. The average 
coding density of 88.7% was calculated for the M.pneumoniae 
genome which gives an average gene size of 1011 bp. Similar 



gure 1 . (Following two pages) The gene map of the complete M.pneumoniae genome. The arrows indicate the position and the size of the predicted ORFs. The colour 
era to ihe functional category in which the ORFs are sorted. The complete name of an ORF can be deduced by the cosmid name above the horizontal scale-line 
*M the number below the arrows (e.g. the ORF name of the first complete arrow in this figure is E07_orf 1 1 13). Rectangles above the scale-line indicate the size and 
Position of different repetitive DNA sequences (see also Table 4). 
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Table 1. Predicted functions and classification of all M. pneumoniae ORFs 



• Biosynthesis of cofactors, prosthetic groups and carrier - Folk add {5] 
F10_orfl60 °MG228 dihydrofolate reductase (dhfr); LACLA 

H 1 0_orf506 MG21 3 dihyrofolate reductase (dyr) homolog protein; ENTFC 

D12lorf269 MG01 3 5,1 O-methylene-tetrahydrofolate dehydrogenase (mtdl ); HAEIN 

D02j)rf406 MG394 serine hydroxy methy transferase (glyA); ACTAC 

H9lIorfl64 MG245 5-formyltetrahydrofolatecyclo-ligase(HI0858) homolog; HAEIN 

• Biosynthesis of cofactors, prosthetic groups and carrier - Heme and porphyrin [1] 

H91_orf453 MG259 possible protoporphyrinogen oxidase (hemK); ECOLI 

• Biosynthesis of cofactors, prosthetic groups and carrier - Thioredoxin [2] 
A65_orfI02 MG124 thioredoxin (trx); YEAST 
K04~orf3l5 MGI02 thioredoxin reductase (trx B); EUB AC 



Cell envelope - Membranes, lipoproteins and 



A05 orfl244 


MG307 


AOS orf252 


MG440 


A65 orf251a 


MG440 


A65_orf787o 


MG260 


A65 orf794 


MG260 (MG185) 




MG395 fMG0681 


D02 orf302 


MG068 (MG395) 


D02 orf439 


MG068 (MG395) 


D02 orf521 


MG395 (MG068) 


DO? orfS31 


MG395 fMG0681 






ryw orf48*J 


MG045 




MG040 


D12_orf231 




E07_orf301 


MG186 


E07_orn94 


MG260 (MG185) 


E09_orfl01 


marginal MG440 


E09_orfl29 




E09_orf276 


MG440 


E09_orf277 


MG440 


E09_orf279 


MG439 


E09_orf283a 


MG439 


E09_orf283b 


MG439 


E09_orf290 


MG439 


E09_orf300 


MG439 


FIl_orf760 


MG260 (MG185) 


G07_orf454 


MG095 


G12_orf305 


MG348 


GT9_orf760 


MG185 


GT9_orf798 


MG260 


H08_orfl005 


MG321 


H08_orfl325 


MG309 


H08_orfl50 


MG307 


H08_orf237 


MG307 


H91_orfi02 


MG260 


H91_orf253 




P01_orflOI 




P02_orfl300 


MG338 


P02_orf793 


MG260 


R02_orf533 


MG067 


R02_orf54t 


MG260 


VXpSPT7_orf320 


MG149 



porines [42] 

putative lipoprotein, MG307 homolog, 
putative lipoprotein, MG440 homolog, 
putative lipoprotein, MG440 homolog, 
putative lipoprotein, MG260 homolog, 
putative lipoprotein, MG260 homolog, 
putative lipoprotein, MG395 homolog, 
putative lipoprotein, MG068 homolog, 
putative lipoprotein, MG068 homolog, 
putative lipoprotein, MG395 homolog, 
putative lipoprotein, MG395 homolog, 
putative lipoprotein 
putative lipoprotein, MG045 homolog, 
putative lipoprotein, MG040 homolog, 
putative lipoprotein 
putative lipoprotein, MG186 homolog, 
putative lipoprotein, MG260 homolog, 
putative lipoprotein 
putative lipoprotein 
putative lipoprotein, MG440 homolog, 
putative lipoprotein, MG440 homolog, 
putative lipoprotein, MG439 homolog, 
putative lipoprotein, MG439 homolog, 
putative lipoprotein, MG439 homolog, 
putative lipoprotein, MG439 homolog, 
putative lipoprotein, MG439 homolog, 
putative lipoprotein, MG260 homolog, 
' putative lipoprotein, MG095 Homolog, 
putative lipoprotein, MG348 homolog, 
putative lipoprotein, MG185 homolog, 
putative lipoprotein, MG260 homolog, 
putative lipoprotein, MG321 homolog, 
putative lipoprotein, MG309 homolog, 
putative lipoprotein, MG307 homolog, 
putative lipoprotein, MG307 homolog, 
putative lipoprotein, MG260 homolog, 
putative lipoprotein 
putative lipoprotein 
putative lipoprotein, MG338 homolog, 
putative lipoprotein, MG260 homolog, 
putative lipoprotein, MG067 homolog, 
putative lipoprotein, MG260 homolog, 
putative lipoprotein, MG149 homolog, 



MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 

MYCGE 
MYCGE 

MYCGE 
MYCGE 



MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 



MYCGE 
MYCGE 
MYCGE 
MYCGE 
MYCGE 



• Ceil envelope - Surface structures and cytadherence [8J 

E07 orfi627 MG191 (MG192) adhesin Pi (orf5, PI operon); MYCPN 

E07_orf 1 218 MG 1 92 (MG 191) hypothetical 1 30K protein (orf6; P 1 operon); MYCPN 

H08_orf274 MG3I8 30 K adhes in-related protein; MYCPN 

H08_orfl018 MG312 cytadherence accessory protein (hmwl); MYCPN 

Fiojorf 1 8 1 8 MG218 cytadherence accessory protein (hmw2); MYCPN 

H08lorf672 MG317 cytadherence accessory protein (hmw3); MYCPN 

D02 orfl036o MG386 protein P200; MYCPN 

F10.Torf405 MG217 protein P65; MYCPN 

• Cell envelope - Surfaces polysaccharides, lipopolysaccharides and antigens [4] 
A65_orf399V MG137 YefE protein homolog; ECOLI 



B01_orf299V 

D09_orf299 

G12_orf282b 



MG025 
MG060 
MG356 



TrsB protein; YEREN 
hypothetical protein YWDF homolog; BACSU 
LicA protein homolog; HAEIN 



• Cellular processes - Cell division [2] 
F10_orf380 MG224 
K05_orf709 MG457 

• Cellular processes - Cell killing [1] 
VXpSPT7_orf424 . MG146 

• Cellular processes - Chaperones [7J 
A05_orf595 MG305 
C09_orf217 MG201 



cell division protein (ftsZ); BACSU 
cell division protein (ftsH); BACSU 



hemolysin (hlyC) homolog protein; HAEIN 

heat shock protein DnaK. ERYRH 
heat shock protein GrpE, HAEIN 
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D02_orfLI6 

D02_orf543 

D12_orf390o 

C09_orf910 

K05_orf309 



MG393 
MG392 
MG019 
MG200 
MG002 



heat shock protein GroES; BACSU 
heat shock protein GroEL; BACSU 
heat shock protein DnaJ; BACSU 
DnaJ homolog protein. MYCCA 
DnaJ homolog protein; YEAST 



• Cellular processes - Detoxification [1] 
D12_orf442 MG008 



possible thiophene and furan oxidation protein (tdhF); BACSU 
Cellular processes - Protein and peptide secretion [9] 



A05_orf348 


MG297 


cell division protein (ftsY); ECOU 


DO9_orf450 


MG048 


signal recognition particle protein (ffh); MYCMY- 


G07_orf808 


MG072 


preprotein translocase (secA); BACSU 


GT9_orf477 


MG170 


preprotein translocase secY subunit; MYCCA 


A65_orf581 


MG138 


GTP-binding membrane protein (lepA); HAEIN 


F10_orf444 


MG238 


trigger factor (tig); HAEIN 


H10_orfl84 


MG210 


prolipoprotein signal peptidase (Isp); STACA 


G07_orf389b 


MG086 


prolipoprotein diacylglyceryl transferase OgO; ECOLl 


Fll_orf339 


MG270 


tipoate protein Ugase (IpIA); ECOU 



• Central intermediary metabolism - Other [5] 

A05_orf241a MG293 

A05_orf320 MG299 

D09 orf508 MG038 

G12__orf390 MG357 

H03 orf237 MG385 



glycerophosphoryl diester phosphodiesterase (glpQ); BACSU 
phosphotransacetylase (pta); BACSU 
glycerol kinase (gipK), HAEIN 
acetate kinase (ackA); BACSU 

glycerophosphoryl diester phosphodiesterase (glpQ); STAAU 



• Central intermediary metabolism - Phosphorous compounds [1] 

G 1 2_orf 1 84 MG35 1 inorganic pyrophosphatase (ppa); THEAC 



• Energy metabolism - Aerobic [3] 
K05 orf312 MG460 
D09 orf384 MG039 
Fll_orf479 MG275 

• Energy metabolism - Amino adds and amines [5] 
F10_orf3O9 
H03_orf438 
H10_orfl98 
H10_orf238 
H10_orf273o 

• Energy metabolism - Anaerobic [1] 
H03_orf351 



L-laclate dehydrogenase (ldh); MYCHY 
aerobic glycerol-3-phospate dehydrogenase (glpD); ECOLI 
NADH oxidase (nox); ENTFA 



carbamate kinase (EC 2.7.2.2) (arcC); PSEAE 
arginine deiminase (arcA); PSEPU 
arginine deiminase (arcA); MYCCA 
arginine deiminase (arcA); MYCCA 
ornithine carbamoyl transferase (otel); ECOLI 



NADP-dependent alcohol dehydrogenase (adh); THEBR 



Energy metabolism - ATP- 



CI2_orf293o 


MG405 


D02_orf207 


MG403 


D02_orfl05 


MG404 


C12_orfl57L 


MG406 


D02_orf5l8 


MG40I 


D02_orf475 


MG399 


D02_orf279 


MG400 


D02_orfl78 


MG402 


D02 orf!33a 


MG398 



proton motive force Interconverslon [9] 

ATP synthase A chain (atpB); MYCGA 
ATP synthase B chain (atpF); MYCGA 
ATP synthase C chain (atpE); MYCGA 
ATP synthase protein I (atpl); MYCGA 
ATP synthase alpha chain (atpA); MYCGA 
ATP synthase beta chain (atpD); MYCGA 
ATP synthase gamma chain (atpG); MYCGA 
ATP synthase delta chain (atpH); MYCGA 
ATP synthase epsilon chain (atpQ; MYCGA 



• Energy metabolism - Glycolysis [10] 


glycerladehyde-3-phosphate dehydrogenase(gap). CLOPA 


A05_orf337 


MG301 


A05_orf409 


MG300 


phosphoglycerate kinase (pgk); THEMA 


B01_orf288 


MG023 


fructose-bisphosphate aldolase (tsr); BACSU 


C12_orf244 


MG431 


triosephosphate isoraerase (tim); ECOLI 


CI2_orf456 


MG407 


enolase (eno) (EC 4.2.1.1 1); PLAFA 


C12_orf508 


MG430 


phosphoglycerate mutase (pgm); BACSU 


H10_orf328 


MG215 


6-phosphofructokinase (pfk); ECOLI 


H10_orf508 


MG216 


pyruvate kinase (pyk); LACLA 


K04_orf430 


MG1I1 


phosphoglucose isomerase B (pgiB); BACST 


R02_orf300 


MG063 


1-phosphofructokinase (fruK); HAEIN 



Energy metabolism - Pentose Phosphate pathway [2] 



P02_orf242 
R02_orf648 



MG066 



• Energy metabolism - Pyruvate DHase [4] 
Fll_orf327 MG273 
FU_.orf358a MG274 
FIl_orf402 MG272 
Fll_orf457 MG271 

• Energy metabolism - Sugars [5] 
D02_orfl52 MG396 
D09„orf224 MG050 
D09_orf554 MG053 
E09_orf364 

K04_orf215L MG112 



L-ribulose-5-phosphate 4-epimerase (araD); ECOLI 
transketolase I (TK I; tktB); RHOSH 



pyruvate dehydrogenase El -beta subunit (pdhB); ACHLA 
pyruvate dehydrogenase El -alpha subunit (pdhA); ACHLA 
dihydrolipoamide acetyltransferase component (E2) (pdhC); ACHLA 
dihydrolipoamide dehydrogenase (pdhD); BACST 



galactose-^-phosphate isomerase subunit (LacA); STRMU 
deoxyribose-phosphate aldolase (deoC); MYCPN 
phosphomannomutase (cpsG); MYCPI 

mannitoM-phosphate5-dehyrogenase(EC 1.1.1. !7KmtID); STTIMU 
D-ribuIose-5-phosphate 3 epimerase (cfxE); ALCEU 



r 
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A65_orf227 


MG114 


C09 orf600 




E30_orf395 


MG437 


Fll_orf84 


MG287 


GI2_orf272V 


MG344 


G12 orf328a 


MG368 


H08_orf289 


MG310 


HI0_orf266 


MG212 


P01 orf268 


MG327 



Fatty acid and phospholipid metabolism [9] 

phosphatidylglycerophosphate synthase (pgsA); HAEIN 
carnitine palmitoyltransferase II precursor(cpt2); HUMAN 
CDP-diglyceride synthetase (cdsA); HAEIN 
(acyl carrier protein; STRGA) 
triacylglycerol lipase (lipj 3; MYCMY 
fatty actd/phospho lipid synthesis protein (plsX); ECOLI 
triacylglycerol lipase (lip) 3; Mycoplasma sp 
I-acyl-sn-glycerol-3-phosphate acyltransferase (plsB); YEAST 
triacylglycerol lipase (lip) 2; MYCMY 

• Purines, pyrimldines, nucleosides and nucleotides - 2'-DeoyribonucIeotide metabolism [3] 
F10 orf328 MG227 thymidy late synthase (thy A); STAAU 
FI0Lorf339 MG229 ribonucleotide reductase 2 (nrdF); SALTY 

F10 orf72I MG231 ribonucleoside-diphosphate reductase (nrdE); SALTY 

• Purines, pyrimldines, nucleosides and nucleotides - Nucleotide and nucleoside Interconverslons [2] 
C12 orf235 MG434 uridylate kinase (pyrH); ECOLI 

H03lorf213 MG382 uridine kinase (udk); HAEIN 

• Purines, pyrimtdlnes, nucleosides and nucleotides - Purine ribonucleotide biosynthesis [3] 
D09_ort388 MG058 phosphoribosylpyrophosphate synthetase (prs); SYNP 
GT9 orf215 MGI71 adenylate kinase (adk); BACST 

K04_orf239 MG107 5'guanylate kinase (gmk); HAEIN 

• Purines, pyrimldines, nucleosides and nucleotides - Salvage of nucleosides and nucleotides [9] 

B01 orfl78 MG030 -u—u^~.i^r.~„ ,..^v cttdc, 

B01_orfl91 MG034 

D09_orfl33 MG052 

D09_orfZ38 MG049 

D09 orf421 MG051 

Fil_orfl33 MG276 

K05_orfl75 MG458 

P01 orf217 MG330 

D12_orf210 MG006 

• Purines, pyrimldines, nucleosides and nucleotides - Sugar-nucleotide biosynthesis and conversions [2] 
A65_orf33S MG1 18 UDP-glucose 4-epimerase (galE); STRTR 
K05 orf291 MG453 UDP-glucose pyrophosphorylase (gtaB); BACSU 



uracil phosphoribosyltransferase (upp); STRSL 
thymidine kinase (tdk); BACSU 
cytidine deaminase (edd); MYCPI 
purine-nucleoside phosphorylase (deoD); ECOU 
thymidine phosphorylase (deoA); MYCPI 
adenine phosphoribosyltransferase (apt); HAEIN 
hypoxanthine-guanine phosphoribosyltransferase (HPT); LACLA 
cytidylate kinase (crak); BACSU 
thymidylate kinase (CDC8) homolog, MYCGE 



• Pyridine nucleotide synthesis [1] 
H03_orf248 MG383 



probable NH(3)-dependent NAD(+) synthetase (outB); BACSU 



• Regulatory function [8] 



B01_orf362 
C09_orf351 
D02_orf291 
Fll_orf733 
H03_orf433 
K04_orf726 
P01_orfl93 
P01_orf292 

• Replication 
A65_orf711 
A19_orf291 
A19_orf872 
B01_orfl443 
K05_ort380 
v* DI2_orf253 
C12_orf68l 
G07_orf473 
H91_orf620 
D12_orf212 
H91_orf658 
G07_orfl66 
K05_orf439 
P02_orf336 
C09_orf635 
C09_orf789 
K05_orf650 
K05_orf839o 
G12_orf206 
G12_orf307 
H9l_orf715 
H91_orf529 
F10_orf286 
C12_orf948L 
G07_orf657 
O09_orf586L 
G12_orf412 
A19_orf277 
A65_orf306 



MG024 
MG205 
MG387 

MG278 (MG376) 

MG384 

MG104 

MG335 

MG329 



hypothetical protein (yyaF) homolog; BACSU 
protein hrcA homolog, BACSU 
GTP-binding protein era homolog; STRMU 
stringent response protein SpoT; ECOLI 
GTP-binding protein (obg); BACSU 
virulence associated protein homolog (vacB); HAEIN 
hypothetical protein YihA (era like) homolog; ECOLI 
hypothetical protein HI0136 (era like) homolog; HAEIN 



- DNA replication, restriction, modification, recombination and repair [46] 

. - £ wvii. i ¥ /. n. DA PCI t 



MG122 
MG262 
MG261 
MG03I 
MG001 
MGO07 

MG420(C-Term:MG4l9) 

MG094 

MG250 

MG010 

MG254 

MG091 

MG469 

MG339 

MG203 

MG204 

MG003 

MG0O4 

MG358 

MG359 

MG244 

MG244 

MG235 

MG421 

MG073 

MG206 

MG360 

MG(M2) 



DNA topoisomerase I (topA); BACSU 
DNA polymerase I (poll, 5*-3* exonuclease) homolog; STRPN 
DNA polymerase in alpha subunit (dnaE); HAEIN 
DNA polymerase HI (dnaE) alpha chain (3*-5' exonuclease); BACSU 
DNA polymerase ffl beta subunit (dnaN); STAAU 
DNA polymerase III subunit delta' (hoIB); ECOLI 
DNA polymerase III subunit gamma and tau (dnaX); ECOLI 
replicative DNA helicase (dnaC); BACSU 
DNA primase (dnaG); BACSU 
DNA primase motif (dnaG); CLOAB 
DNA ligase (lig); ECOLI 

single-stranded DNA binding protein (ssb); HAEIN 
chromosomal replication initiator protein (dnaA); MYCCA 
recombination protein (recA); STAAU 
topoisomerase IV subunit B (parE), BACSU 
topoisomerase IV subunit A (parQ, BACSU 
DNA gyrase subunit B (gyrB); MYCPN 
DNA gyrase subunit A (gyrA); STAAU 
Holliday junction DNA helicase (ruvA); ECOLI 
Holliday junction DNA helicase (ruvB); HAEIN 
DNA helicase H (mutBl); HAEIN 
DNA helicase pcrA homolog; STAAU 
endonuclease IV (nfo); ECOLI 
excinuclease ABC subunit A (uvrA); ECOLI 
excinuclease ABC subunit B (uvrB); ECOLI 
excinuclease ABC subunit C (uvrC). BACSU 
UV protection protein (mucB); ECOLI 
formamidopyrimidine-DNA glycosylase (fpg); BACFI 
PrrB homolog protein, ECOLI 
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D09_orf383 


MG047 


G07_orf240 


MG097 


C12_orf249 


- 


GT9_orf238 


- 


GT9_otf319V 


MG184 


H03_orfl91 


MG380 


H03_orf612 


MG379 


H10_orfl45L 




H10_orfl87V 




H9l_orf206 




H91_orf268 




H91_orf330 




H91_orf376 




H9i_orf543 




P02_orf363V 




R02_orf335 




E30_orf375 


MG438 



S-adenosyl methionine synthetase 2 (metX); ECOLI 

uracil DNA glycosylase (ung); ECOLI 

restriction-modification enzyme subunit SIB (hsdS); MYCPU 

type I restriction enzyme ecoU specificity protein (hsdS) homolog; HAEIN 

adenine-specific methy Itransferase EcoRI (mtel ); ECOLI 

glucose inhibited division protein (gidB); ECOLI 

glucose inhibited division protein (gidA); ECOLI 

type I restriction enzyme ccokl specificity protein (hsdS) homolog; HAEIN 

HsdS 1 B protein homolog; MYCPU 

Type 1 restriction enzyme (hsdR) homolog; ECOLI 

type I restriction enzyme ecoU specificity protein (hsdS) homolog; HAEIN 

type I restriction enzyme ecoH specificity protein (hsdS) homolog; HAEIN 

Type 1 restriction enzyme (hsdR) homolog; ECOLI 

type I restriction enzyme (hsdM); ECOLI . 

type I restriction enzyme ccokl specificity protein (hsdS) homolog; HAEIN 
type I restriction enzyme ecokl specificity protein (hsdS) homolog; HAEIN 
MG438 homolog, MYCGE 



• Transcription 


- Degradation of RNA [2] 




G12_orf282a 


MG367 


ribonuclease III (mc); ECOLI 


K05_orfI18V 


MG465 


RNaseP C5 chain (mpA); MYCCA 


• Transcription 


- RNA synthesis, modification and DNA transcription [11] 


GT9_orf327 


MG177 


RNA polymerase alpha core subunit (rpoA); BACSU 


G12_orfl391o 


MG341 


RNA polymerase beta subunit (rpoB); BACSU 


F04_orfl290 


MG340 


DNA-di reeled RNA polymerase beta' chain (rpoC); THEM A 


B01_orfl46 


MG022 


DNA-directed RNA polymerase delta subunit (rpoE); BACSU 


H91_orf499 


MG249 


RNA polymerase sigma-A factor (sigA); BACSU 


Fll_orfl60 


MG282 


transcription elongation factor (greA); RICPR 


D09_orf320 


MG054 


transcription anti termination factor (nusG); BACSU 


E07_orf540o 


MG141 


N -utilization substance protein A homolog (nusA); BACSU 


C12_orf450 


MG425 


ATP-dependent RNA helicase (deaD); HAEIN 


H08_orf409 


MG308 


ATP-dependent RNA helicase (deaD); ECOLI 


DI2_orfl030 


MG018 


hypothetical helicase Yb95 homolog; YEAST 



• Translation 


- Amino acyl (RNA synthetases and tRNA modification [24] 


A05_orf900 


MG292 


alanyl-tRNA synthetase (alaS); ECOLI 


H03 orf537 


MG378 


arginyl-tRNA synthetase (argS); BRELA 


K04_orf455o 


MGI13 


asparaginyl-tRNA synthetase (asnS); ECOLI 


D09_orf557 


MG036 


aspartyl-tRNA synthetase (aspS); THEAQ 


H91_orf437 


MG253 


cysteinyl-tRNA synthetase (cysS); BACSU 


K05_orf484 


MG462 


glutamyl-tRNA synthetase (gltX); BACST 


H91_orf449 


MG251 


glycyl-tRNA synthetase (grsl); YEAST 


B01 orf414o 


MG035 


histidyl-tRNA synthetase (hisS); STREQ 


G12_orf861 


MG345 


isoleucine-tRNA ligase (ileS); STAAU 


Fll_orf793o 


MG266 


leucyl-tRNA synthetase (leuS); BACSU 


A65_orf489 


MG136 


lysyl-tRNA synthetase (lysS); BACSU 


G12__orf31i 


MG365 


methionyl-tRNA formyltransfenise (fmt); ECOU 


B01_orf512 


MG021 


methionyl-tRNA synthetase (metS); BACST 


G07_orfl88 


MG083 


peptidyl-tRNA hydrolase homolog (pth); HAEIN 


C09_orf341 


MG194 


phenyla!anyi-tRNA synthetase alpha-subunit (pheS); BACSU 


C09_orf805 


MG195 


phenylalanyl-tRNA synthetase beta chain (pheT); BACSU 


GT9_orf243V 


MG182 


pseudouridylate synthase I (hisT); ECOLI 


Fl l_orf483 


MG283 


putative proIyl-tRNA synthetase (YHIO; proS); YEAST 


DI2_orf420 


MG005 


seryl-tRNA synthetase (serS); BACSU 


GI2_orf564 


MG375 


threonyl-tRNA synthetase (thrSv); BACSU 


K05_orf210 


MG445 


tRNA (guanine-Nl)-methyItransferase (trmD); HUMAN 


A65_orf346 


MG126 


tryptophanyl-tRNA synthetase (trpS); HAEIN 


K05_orf399 


MG455 


tyrosyl tRNA synthetase (tyrS); BACCA 


P01_orf838 


MG334 


valyl-tRNA synthetase (valS); BACST 



* Translation - Degradation of proteins, peptides and glycopepttdes [8] 

B01_orf309 MG020 proline iminopeptidase (pip); NEIGO 

D02_orf445 MG39I nonspecified aminopeptidase; MYCSA 

D09_orf319 MG046 o-sialoglycoprotein endopeptidase (gep); PASHA 

Fl(Lorf795 MG239 ATP-dependent protease (Ion); BACSU 

G12_orf7 15 MG355 ATP-dependent protease binding subunit (clpB) homolog; HAELN 

GT9_orf61 1 MG1 83 oligoendopeptidase F (pepF); LACLA 

H03_orfl93o MG377 MG377 homolog (put. zinc protease). MYCGE 

P01_orf354 MG324 X-Pro dipepudase (pepX); LACDE 



• Translation - Protein modification and translation factors [15] 



GT9_orf78 


MG173 


initiation factor 1 (infA); BACSU 


VXpSPT7_orf617 


MG142 


protein synthesis initiation factor 2 (infB); BACST 


C09_orf201 


MG196 


translation initiation factor IF3 (infO; MYCFE 


G07_orf688 


MG089 


elongation factor G(fus); THEAQ 


B01_orfl90 


MG026 


elongation factor P (efp) homolog; HAEIN 


C12_orf298 


MG433 


elongation factor Ts (tsf); SPICI 


K05_orf394 


MG451 


elongation factor TU (tuO; MYCGE 


H91_orf359V 


MG258 


peptide chain release factor 1 (RF1; prfA);BACSU 


E30_orfl84 


MG435 


ribosome releasing factor (frr); HAEIN 


GT9_orf248 


MG172 


methionine amino peptidase (map); BACSU 


K04_orf216 


MG106 


polypeptide deformylase (del); HAELN 


K04„orf259 


MGI08 


protein phoshatase 2C homolog; YEAST 
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K04_orf389 


MG109 


probable protein serine/threonine kinase; CAEEL 


K05_orfI51 


MG448 


pilB homolog (fragment); HAEIN 


C12_orfl57 


MG408 


peptide methionine sulfoxide reductase (pmsR). ECOLI 


• Translation - Ribosomal proteins: synthesis and modification [53] 


G07 orf226 


MG082 


ribosomal protein LI (rpLI); BACST 


V XpS PT7_orf287 a 


MG154 


ribosomal protein L2 (rpL2); MYCCA 


VXoS PT7~orf287b 


MG151 


ribosomal protein L3 (rpL3); MYCCA 


VXn^PT7 orf212 


MGI52 


ribosomal protein L4 (rpW); MYCCA 


fiTQ orflROb 


MG163 


ribosomal protein L5 (rpL5); HAEIN 


fiTQ orfl R4 


MGI66 


ribosomal protein L6 (rpL6); MYCCA 


fil? orfl 52 


MG362 


ribosomal protein L7/L12 ('A* type) (rpL7/L12); MICLU 


fi07 orfl 49 

VJv / _U1 1 1"/ 


MG093 


ribosomal protein L9 (rpL9); BACST 


fi!5 orflfil 


MG361 


ribosomal protein L10 (rpLlO); THEMA 


fi07 orfl"V7 

VJV 1 _UI 1 1 J 1 


MG081 


ribosomal protein LI 1 (RPLI 1); THEMA 


C15 orfl46 


MG4I8 


ribosomal protein L13 (rpL13); ECOU 


fiTQ orfl 55 


MG161 


ribosomal protein L14 (rpL14); BACST 


fiTQ orfl51 


MG169 


ribosomal protein LIS (rpL15); MYCCA 




MG158 


ribosomal protein LI 6 (rpLI6); MYCCA 


fiTQ orf!54fl 


MG178 


ribosomal protein L17 (rpL17); BACSU 


fiTQ orfl 1 fih 


MG167 


ribosomal protein LI 8 (rpLt8); BACST 


ifrt< nrfllQ 


MG444 


ribosomal protein L19 (rpLI9); BACST 




MG198 


ribosomal protein L20 (rpL20); MYCFE 


Pin nrfKYlh 


MG232 


ribosomal protein L21 (rpL21); BACSU 


VYn<;PT7 nrflR4 
v Apo " i /_on 1 OH 


MG156 


ribosomal protein L22 (rpL22); HAEIN 


VYn<IPT7 nrf517 


MG153 


ribosomal protein L23 (rpL23); THEMA 


fiTQ orfl 1 1 a 
vj i y_uri ilia 


MG162 


ribosomal protein L24 (rpL24); BACST 


Pin nrfl fid 


MG234 


ribosomal protein L27 (rpL27); BACSU 


v- 1 z_onoj 


MG426 


ribosomal protein L28 (rpL28); BACSU 


vj i y__on iiio 


MG159 


ribosomal protein L29 (rpL29); THEMA 


rl7 1 _OU7 / 


MG257 


ribosomal protein L31 (rpL31); ECOU 


■ vj i z_on j / 


IVIvjjOJ 


ribosomal protein L32 (rpL32); HAEIN 


rui_onjj 


MG325 


ribosomal protein L33 (rpL33); BACST 




Vffi4AA 


ribosomal protein L34 (rpL34); PROMI 




MG197 


ribosomal protein L35 (rpL35); BACST 


vj i yjouj / 


MG174 


ribosomal protein L36 (rpL36); CHLTR 


vju/_onzv** 


Mfifrcn 


ribosomal protein S2 (rpS2); SPIPL 


WnCDn rti-rm 


MGI57 


ribosomal protein S3 (rpS3); MYCCA 


fius_o n zu j 


MG31 1 


ribosomal protein S4 (rpS4); BACSU 


GT9_orf219 


MG168 


ribosomal protein S5 (rpS5); BACSU 


G07_orf215 


MG090 


ribosomal protein S6 (rpS6); ECOLI 


G07 orfl 55 


MG088 


ribosomal protein S7 (rpS7); BACST 


GT9_orfl42 


MG165 


ribosomal protein S8 (rpS8); MYCCA 


C12 orfl32 


MG417 


ribosomal protein S9 (rpS9); BACST 


VXpSPT7_orfI08 


MG150 


ribosomal protein S 10 (rpS 10); THEMA 


GT9_orfl21 


MG176 


ribosomal protein SI 1 (rpSll); BACST 


G07_orfi39 


MG087 


ribosomal protein S12 (rpS12); BACST 


GT9_orfl24b 


MG175 


ribosomal protein S13 (rpS13); BACSU 


GT9_jjrf6l 


MG164 


ribosomal protein S14 (rpS14); MYCCA 


C12_orf86 


MG424 


ribosomal protein S15 (BS18); BACST 


K05 orf88 


MG446 


ribosomal protein S16 (BSI7); BACSU 


GT9_orf85 


MGI60 


ribosomal protein S17 (rpS17); MYCCA 


G07_orfl04b 


MG092 


ribosomal protein S 1 8 (rpS 1 8); ECOLI 


VXpSPT7_orf87 


MG155 


ribosomal protein S 19 (rpS19); MYCBO 


G12_orf87 


MG(M3) 


ribosomal protein S20 (rpsT); ECOLI 


D12_orf288 


MG0I2 


ribosomal protein S6 modification protein (rimK); ECOU 


H91_orf242a 


MG252 


hypothetical protein YacO (rRNA methylase) homolog; BACSU 


VXpSPT7_orfU6 


MG143 


ribosome binding factor A homolog (rbfA); ECOLI 



* Transport 

A05_orf382 

D09_orf286a 

D09_orf286b 

D09_orf560L 

F10_orf491 

F10_orf5O3 

G07_orO76 

G07_orf389a 

G07_orf423 

G07_orf851 

GT9.orf303 

R02_orf465 

C12_orf225 

C12_orf329 

CI2_orf651V 

GT9_orf274 

K05_orf284 

A65_orf311 

A65_orf572 

E07_orf319 

E07_orf329 

E07_orf586 

A05_orf270L 

G07_orf872V 



and binding proteins - ABC 
MG303 
MG044 
MG043 
MG042 
MG225 
MG226 
MG078 
MG077 
MG079 
MG080 
MG180 
MG065 
MG409 
MG410 
MG411 
MG179 

MG065 (MG467) 

MG121 

MG1I9 

MGI89 

MG188 

MGI87 

MG304 

MG071 



transport (34] 

abc transport ATP-binding protein (artP); ECOLI 
spermidine/putrescine transport system permease (potl); ECOLI 
spermidine/putrescine transport system permease (potB); HAEIN 
spermidine/putrescine transport ATP-binding prot (potA); ECOLI 
hypothetical protein (gi: 710640) homolog (put amino acid permease); CLOPE 
general amino acid permease GAP I homolog; YEAST 
oligopeptide transport system permease protein (amiD); STRPN 
oligopeptide transport system permease protein (oppB); BACSU 
oligopeptide transport ATP-binding protein (oppD); BACSU . 
oligopeptide transport ATP-binding protein (oppF); BACSU 
histidine transport ATP-binding protein (hisP); ECOU 
glutamine transport ATP-binding protein (glnQ); ECOLI 
phosphate transport system regulatory protein (phoU); ECOLI 
phosphate transport ATP-binding protein (pstB); ECOLI 
phosphate transport system permease protein (pstA); ECOU 
sulfate transport ATP-binding protein (cysA); SYNP 
sulfate transport ATP-binding protein (cysA); S t HP 
high' affinity ribosc transport protein (rbsC); HAEIN 
hypothetical ABC transporter (yjcW) homolog; ECOLI 
sn-glycerol-3-phosphale transport system permease protein (ugpE); ECOLI 
sn-gIycerol-3-phosphate transport system permease protein (ugpA); ECOLI 
sn-glycerol-3-phosphate transport system permease protein (ugpQ; ECOU 
abc transport ATP-binding protein (cbiO), SALTY 
MG(2+) transport ATPase. P-type 1 (mgtA); ECOLI 
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A05_orf244 


MG290 


A05_orf380V 


MG289 


A05_orf542 


MG291 


D02_orf660 


MG390 


D12_orf623 


MGOU 


D12_orf634 


MG0I5 


F10__orf326 


MG179 


F10_orf750 




H08_orf565 


MG322 


K05 0rO39 


MG467 



• Transport and binding proteins • 

E09_orfl43V 

E09_orf379 

R02_orf694 MG062 
GT9_orf940o MG069 
D09_orf88 MG041 
P02_orfl59 

C12_orf572 MG429 



ATP-binding protein P29; MYCHR 

high affinity transport system protein P37; MYCHR 

transport system permease protein P69; MYCHR 

lactococcin transport ATP-binding protein (lcnDR3); LACLA 

transport ATP-binding protein (pmd 1 ); SCHPO 

transport ATP-binding protein (msbA); HAEIN 

bcrA homolog protein; BACLI 

putative ABC transport permease 

Na(+) translocating ATPase subunit J (ntpj); ENTHR 

devA protein homolog; ANASP 

PTS transport [7] 

PTS system man ni to I- specific component IIA (EIIA-MTL)(mtlF); STRMU 

PTS system manni to! -specific component IIA (EIIA-MTL)(mtlA); STACA 

fructose-permease II BC component (fruA); ECOU 

PTS system, glucose-specific II ABC component (EIIABC-GLC); BACSU 

phosphocarrier protein HPr(ptsH); MYCCA 

hypothetical phosphotransferase protein YjfU homolog; ECOLI 

PEP-dependent HPr protein kinase phosphoryltransferase (Enzyme I) (ptsl); 

STRSL 



• Transport and binding proteins • 
B0l_orf264 MG033 
R02_orf564o MG061 
A05_orf475 MG294 



Other transport systems 13] 

glycerol uptake facilitator (glpF); BACSU 
hexosephosphate transport protein (uhpT); SALTY 
MG294 homolog(put. permease), MYCGE 



• Other categories - Adaptations and atypical conditions [3] 

K05_orfl40 MG454 osmotically inducible protein (osmC); ECOLI 

K05_orf270 MG470 soj homolog protein; BACSU 

K05_orf263V MG463 S-adenosyImethionine-6-N\N'-adenosyl (rRNA) dimethytransferase (ksgA); 

ECOU 



viiner categories 










hvnothettral 11 7 KD nrotein hnmolne fvlxMV RACXU 


AOS nrfl?0 


MG296 


MCI2Q6 homolog MYCGE 


AOS nrf?QO 




hvnnthetieal nrntftin fYidA^ homoloff* ECOLI 


A05 orfin 


MG302 


MG302 homolog, MYCGE 


AOS orf370 


MG295 


hypothetical protein (HI0174); HAEIN 


A05lorf395 


MG306 


MG306 homolog, MYCGE 


A05_orf982 


MG298 


PI 15 protein homolog (SGC3); MYCHR 


A19_orf200 


MG264 


hypothetical protein (HI0890) homolog; HAEIN 


A19_orf282 


MG265 


hypothetical protein (YidA) homolog; ECOLI 


A19_orf292 


MG263 


hypothetical protein (YidA) homolog; ECOLI 


A65_orfl00 


MGI34 


hypothetical protein YaaK homolog; BACSU 


A65_orfll7 


MG129 


MG129 homolog, MYCGE 


A65_orfI44 


MG132 


hypothetical protein Hitl homolog; YEAST 


A65_orfi45 


MG127 


hypothetical protein Ygll homolog; STRVR 


A65_orfl66 


MG260(MG185) 


MG260 homolog. MYCGE 


A65_orf223 


MG117 


MG1 17 homolog. MYCGE 


A65_orf25lb 


MG116 


MG1 16 homolog. MYCGE 


A65_orf259 


MG128 


hypothetical protein HI0072 homolog; HAEIN 


A65_orf266 


MGI33 


MG133 homolog, MYCGE 


A65_orf281 


MG125 


hypothetical protein (gi: 973220) homolog; ECOLI 


A65_orf285 


MG135 


MG135 homolog, MYCGE 


A65 orf377 


MG260(MGI85) 


MG260 homolog, MYCGE 


A65_orf475 


MGI23 


MG123 homolog, MYCGE 


A65_orf493 


MGI30 


hypothetical protein Ysri homolog; MYCMY 


A65 orf517 


MGI20 


MG120 homolog. MYCGE 


A65_orf569 


MGI39 


MG139 homolog, MYCGE 


B01_orfl08 


MG029 


hypothetical protein (gi: 606093) homolog; ECOLI 


B01_orfi68 


MG027 


MG027 homolog, MYCGE 


B0I_orfl86L 


MG032 


MG032 homolog, MYCGE 


BOl_orf203 


MG028 


MG028 homolog, MYCGE 


B01_orf338 


MG032 


MG032 homolog. MYCGE 


B01_orf666 


MG032 


MG032 homolog. MYCGE 


B01_orf672 


MG032 


MG032 homolog, MYCGE 


B0I_orf673 


MG032 


MG032 homolog, MYCGE 


C09_orfl04 


MG191 


(MG191 homolog, MYCGE) 


C09_orfl21 


MG202 


MG202 homolog, MYCGE 


C09_orfl43b 


MG199 


MG199 homolog, MYCGE 


C09_orfl59 


MG207 


MG207 homolog. MYCGE 


C12_orfl4t 


MG427 


MG427 homolog, MYCGE 


C12_orfl72 


MG428 


MG428 homolog, MYCGE 


CI2_orf334 


MG413(MG414) 


MG4 13 homolog, MYCGE 


C12_orD44 


MG4I5 


MG4 15 homolog, MYCGE 


C12_orf385 


MG4I2 


MG4 12 homolog, MYCGE 


Cl2_orf404 


MG432 


hypothetical protein (yfiB) homolog; SPICI 


CI2_orf56I 


MG423 


MG423 homolog. MYCGE 


C12_orf839 


MG422 


MG422 homolog, MYCGE 


C12_orf997 


MG414 


MG4 14 homolog. MYCGE 


D02_orfl08 


MG388 


MG388 homolog. MYCGE 


D02_orfl29 


MG389 


MG389 homolog, MYCGE 


D02_orfI35L 


MG067 (MG395. MG068) 


MG067 homolog, MYCGE 


D02_orfl40 


MG395 (MG068) 


MG395 homolog, MYCGE 
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D02_orfl50 


MG068 (MG395) 


MG068 homolog. MYCGE 


D02_orfl57L 


MG395 (MG068) 


MG395 homolog. MYCGE 


D02_orf225L 


MG068 (MG067, MG395) 


MG068 homolog, MYCGE 


D02_orf265V 


MG068 (MG395. MG067) 


MG068 homolog, MYCGE 


D02_orf346 


MG068 (MG395) 


MG068 homolog, MYCGE 


D02_orf347 


MG067 (MG395, MG068) 


MG067 homolog. MYCGE 


D02_orf353V 


MG068 (MG395) 


MG068 homolog, MYCGE 


D02_orf569 


MG397 


MG397 homolog, MYCGE 


D09_orfl25 


MG055 


MG055 homolog, MYCGE 


D09_orfl47 


MG059 


hypothetical protein A43259 homolog; ENTHR 


D09_orfl78 


MG057 


hypothetical protein YabF homolog; BACSU 


D09_orf276 


MG056 


hypothetical protein YabC homolog; BACSU 


D09_orf451 


MG037 


pre-B cell enhancing factor homolog (pbeF); HUMAN 


D09_orf518 


MG096 


MG096 homolog, MYCGE 


D09__orf632 


MG288 (MG096) 


MG288 homolog, MYCGE 


D12_orf26I 


MG009 


hypothetical protein yabD homolog; BACSU 


DI2_orf285 


MG011 


MG01 1 homolog. MYCGE 


E07_orflll3 


MG140 


MG 1 40 homolog. MYCGE 


E07_orf265 


MG260(MG185) 


MG260 homolog, MYCGE 


E07_orf324 


MG190 


hypothetical 28K protein (orf4, PI operon); MYCPN 


E07_orf485 


MG260(MG185) 


MG260 homolog, MYCGE 


E09_orfl36 


MG441 


MG441 homolog. MYCGE 


E09_orf204o 


- 


protein P30, MYCPN 


E09_orf287o 


MG439 


MG439 homolog. MYCGE 


E09_orf302 


MG440 


MG440 homolog. MYCGE 


F04 orfl54 


MG288 (MG096) 


MG288 homolog, MYCGE 


F04_orf260V 


MG288 


MG288 homolog, MYCGE 


F10_orfl00a 


MG233 


hypothetical protein YsxB homolog; BACSU 


F10_orfI41b 


MG221 


hypothetical protein YabB homolog; ECOLI 


F10 orfl53 


MG230 


MG230 homolog, MYCGE 


Fl0_orfl58 


MG236 


MG236 homolog, MYCGE 


F10_orf291 


MG240 


MG240 homolog, MYCGE 


F10_orf294 


MG237 


MG237 homolog, MYCGE 


F10_orf308 


MG222 


hypothetical protein YabC homolog; ECOL! 


F10_orf4I9 


MG223 


MG223 homolog, MYCGE 


F10_orf621 


MG241 


MG241 homolog. MYCGE 


F10_orf632o 


MG242 


MG242 homolog, MYCGE 


F10_orf90 


MG220 


MG220 homolog. MYCGE 


Fll_orfll4 


MG267 


MG267 homolog. MYCGE 


Fll_orfl22a 


MG284 


MG284 homolog, MYCGE 


Fll_orfl97 


MG286 


MG286 homolog, MYCGE 


Fll_orf218 


MG279 


MG279 homolog. MYCGE 


Fll_orf229 


MG268 


hypothetical protein YaaF homolog; BACSU 


Fll_orf287 


MG280 


MG280 homolog, MYCGE 


Fil_orf346 


MG285 


MG285 homolog, MYCGE 


FM_orf358b 


MG269 


MG269 homolog, MYCGE 


Fll_orf582 


MG281 


MG28I homolog. MYCGE 


Fll orf887 


MG277 


MG277 homolog. MYCGE 


G07_orfl030 


MG075 


protein PI 00; MYCPN 


G07_orfl35 


MG074 


MG074 homolog. MYCGE 


G07 orfl38 


MG076 


MG076 homolog, MYCGE 


G07_orf289 


MG084 


hypothetical protein (yacA) homolog; BACSU 


G07_orf312 


MG085 


MG085 homolog. MYCGE 


G07_orf4l7 


MG288 (MG096) 


MG288 homolog, MYCGE 


G07 orf478o 


MG100 


PET1 12 protein homolog; YEAST 


G07_orf478V 


MG099 


amidase homolog (S47454); YEAST 


G07_orf479 


MG098 


MG098 homolog. MYCGE 


G12_orfl04 


MG376 


MG376 homolog. MYCGE 


G12__orfl09 


MG353 


MG353 homolog. MYCGE 


GI2_orfl36 


MG354 


MG354 homolog, MYCGE 


G12_orfl66a 


MG342 


MG342 homolog. MYCGE 


G12 orfl66b 


■ MG346 


hypothetical protein Ygl3 homolog; B ACST 


G12_orf210V 


MG347 


hypothetical protein HI0340 homolog; HAEIN 


G12_orf2I8 


MG364 


MG364 homolog, MYCGE 


G12_orf269 


MG374 


MG374 homolog, MYCGE 


Gl2_orf281 


MG373 


MG373 homolog. MYCGE 


G12_orf325 


MG371 


hypothetical 28K protein (PI operon) homolog; MYCPN 


G12_orf326 


MG370 


hypothetical protein (HI0176) homolog; HAEIN 


G12_orf328b 


MG350 


MG350 homolog, MYCGE 




MG343 


MG343 homolog. MYCGE 


G12_orf387 


MG372 


MG372 homolog, MYCGE 


G12_orf413 


MG349 


MG349 homolog, MYCGE 


G12_orf558 


MG369 


MG369 homolog. MYCGE 


G12_orf664 


MG366 


MG366 homolog. MYCGE 


GT9_orfl48 


MG260 


MG260 homolog, MYCGE 


GT9_orf434 


MG181 


MG181 homolog. MYCGE 


H03_orf235 


MG381 


MG381 homolog, MYCGE 


H08__orfl57b 


MG321 


MG32I homolog, MYCGE 


H08_orfl93 


MG319 


MG319 homolog. MYCGE 


H08_orf231 


MG323 


hypothetical protein YZAC homolog; BACSU 


H08_orf263 


MG313 


MG313 homolog, MYCGE 


H08_orf287 


MG320 


(cytochrome C oxidase polypeptide I (CtaD); BACSU) 


H08_orf314 


MG315 


MG315 homolog. MYCGE 


H08_orO45 


MG307 


MG307 homolog. MYCGE 
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riuo_onjoy 


MG3 16 


(competence locus E (comE3); BACSU) 


I40.Q nrfAAQ 


MG314 


MG314 homolog, MYCGE 


HUo_orij/Zo 


MG307 


MG307 homolog, MYCGE 


iiuo_orr j y i 


MG321 


MG321 homolog. MYCGE 


nU5_on / zo 


MG307 


MG307 homolog, MYCGE 


HIO nrftdO 


kJtfVJI I 
VfWjLl 1 


MLiZl 1 homolog, MYCGE 


n i u_on i y o 


MUZUS 


MG208 homolog, MYCGE 


HIO nrf?ftR 


VAC.') 1 A 


hypothetical protein P35155 homolog; BACSU 




MG209 


hypothetical protein YceC homolog; ECOLI 


uq i ftr nn 


MU24S 


MG248 homolog; MYCGE 


H7 i_ori£*t4 


MG243 


MG243 homolog, MYCGE 


UOI nrDlQ 

ny i_onz jy 


M0247 


hypothetical protein YgiH homolog; ECOLI 


nvi_onzjo 


MUZjO 


MG256 homolog. MYCGE 


HOI nrf>Rl 


MUx4o 


M024o nomoiog, MYCGE 


tiyl_OTlJJH 




fcjf^^CC Ljh M _t A _ %.J"\J^^*r* 

MG255 homolog. MYCGE 


HQ t nt-fATT 

rty i_ono/ / 


MG260 


MG260 nomoiog, MYCGE 


VA/4 nrflfn 


MG105 


MG105 homolog. MYCGE 


VAyl .-nil 

KU4_0nZZZ 


MGiOl 


MG101 homolog, MYCGE 




Mul 1U 


hypothetical protein YjeQ homolog; ECOLI 




MG103 


MG103 homolog. MYCGE 


MJ3_0n I Oy 


MG459 


hypothetical protein HI0671 homolog; HAEIN 






MU44V nomoiog, MYCGE 


ivuj_oriz j / 


MG450 


degV homolog protein; BACSU 


K.Uj_ort/j I 


MG452 


MG452 homolog, MYCGE 


Mjj_onz/ 1 




M0442 homolog, MYCGE 


I\AJj_0n.J4j 


MG456 


MG456 homolog, MYCGE 


VAC nr no< 


MG464 


hypothetical protein 1 (S42122); MYCCA 


K.Uj_0n4Ul 


MG443 


hypothetical protein (P27712); SPICI 


K05_orf425 


MG461 


MG461 homolog, MYCGE 


K.05_On4y9 


MG447 


MG447 homolog, MYCGE 


rUl_On 1U3J 


MG328 


MG328 homolog, MYCGE 


rUi_ortiy / 


MG333 


hypothetical protein HI 1366 homolog; HAETN 


PUl_0n2uy 


MG331 


MG331 homolog, MYCGE 


P01_orf235 


MG332 


hypothetical protein M0315 homolog; HAEIN 


PUl_on293 


MG326 


degV homolog protein; BACSU 


P01_orf34 1 


marginal MG025 


hypothetical protein YibD homolog; ECOLI 


rU£_0rtl4U 


MG337 


M0337 homolog, MYCGE 


rU2_ortzio 


* 


hypothetical protein YjfV homolog; ECOLI 


FU2_on305 




hypothetical protein YjfW homolog; ECOLI 


P02 orTilfi 


MG338 


MftTlft hnmolna 


P02_orf408 


MG336 


, nitrogen fixation protein (nifS); HAEIN 


P02_orf427 


MG288 (MG096) 


MG288 homolog, MYCGE 


P02_orf458 


MG096 (MG288) 


MG096 homolog. MYCGE 


P02_orf509 


MG288 (MG096) 


MG288 homolog, MYCGE 


P02_orf660 




hypothetical protein YjfS homolog; ECOLI 


R02_orfl386V 


MG064 


MG064 homolog. MYCGE 


R02 orfl47 


MG260 


MG260 homolog, MYCGE 


R02_orf469 


MG061 


MG061 homolog, MYCGE 


R02_orf524 


MG068 (MG067) 


MG068 homolog, MYCGE 


VXpSPT7_orf269 


MG145 


hypothetical protein (YaaQ homolog; PSEFL 


VXpSPT7_orf377 


MG147 


MG147 homolog, MYCGE 


VXpSPT7_orf402 


MG144 


MG144 homolog, MYCGE 


VXpSPT7_orf445 


MG148 


MG148 homolog, MYCGE 



• no classification so far [86] 

Al9_orflI40 

A19_orfl29 

A19_orf204 

A19_orf229V 

A19_orf591 

A65_orfll5 

A65_orfll8 

B01_orfl03b 

B0I_orfll6L 

B01_orfl47 

b01_orfI821 

B01_orf274 

C09_orfl30b 

C09_orfl40o 

C09_orfl65 

C09_orfl72 

C09_orf223 

C09_orf251 

C09_orf404 

C09_orf422 

C09_orH18 

CI2_orfl81o 

C12_orf247 

D02_orfl00 

D02_orfl09 

D02_orfl22a 

D02_orfl22b 

D02_orfl28 

D09„orfl27a 
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D12_orfl3l 
D12_orf235 
D12_orf257 
E07_orfl33 
E07_orfl40 
E07_orfl63 
E07_orfl66 
E07_orfl75 
E07_orfl79 
E07_orf228 

E09_orfl36L marginal MG440 

E30_orf352 
F04_orfl20 
FO4_orfl50 
F10_orf218 

FlO_orf357 marginal MG01 1 

F10_orf565 
Fl(Lorf741 
Fll_orfI48o 
Fll_orf879 
G12_orfl40b 
G12_orfl68 
G12_orf225 
GT9_orfll3 
H03_orfl52 
H08_orfl02 
H10_orfU9 
H10_orf206 
H10_orf220L 
H91_orfU5 
H91_orfl80 
H91_orf2l6 
K05_orfl01a 

K05__orfl06 - . 

K05_orfl882 marginal MG064 

K05_orf250 

P01_orfl40 

P01_orfl99 

P01_orf243 

P02_orfl03b 

P02_orfl26 

P02_orfl43 

P02_orfl47 

P02_orfl63 

P02_orfl96 

P02_orf253 

P02_orf474 

R02_orflOl 

R02_orfl05 

R02_orfl40 

R02_orfl50 

R02_orfl83o 

R02_orf254 

R02_orf264 

R02_orf329 

R02_orf440 

VXpSPT7_orfll2 

. hypothcUcal ORFs derived from repetitve 

A05_orfl39 

A19_orf2U 

A65_orfll5 

B01_orfl47 

C09_orfl40o 

C09_orfl49a 

E07_orfl63 

Fll_orfl48o 

G12_orfi68 - 

H08_orfl57a marginal MG321 

H91_orfl80 

P01_orfl99 

P02_orfl03b 

P02_orfI96 

R02_orfl38 

R02_orfl40 

R02_orfl83o 

C09_orfl49b 

H08 > orf329V MG321 
A65 orf465V MG191 
E07lorf413 MG191 
E07 orf256L MG191 
A05~orf278 MG191 
H08lorf270 MG191 
P02_orf422V MG191 



DNA elements [46] 



adhesin PI (group 2) homolog; MYCPN 
adhesin PI (group 2) homolog; MYCPN 
adhesin PI (group 2) homolog; MYCPN 
ADP1.MYCPN adhesin PI precursor homolog; MYCPN 
ADPlJvfYCPN adhesin PI precursor homolog; MYCPN 
ADPlJvTYCPN adhesin PI precursor homolog; MYCPN 
ADP1_MYCPN adhesin PI precursor homolog; MYCPN 
ADPlJvtYCPN adhesin PI precursor homolog; MYCPN 
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Table 1. Continued 



ADP1.MYCPN adhesin PI precursor homolog; MYCPN 
ADP1_MYCPN adhesin PI precursor homolog; MYCPN 
ADP1 JrfYCPN adhesin PI precursor homolog; MYCPN 
ADP1_MYCPN adhesin PI precursor homolog; MYCPN 
ADP1_MYCPN adhesin PI precursor homolog; MYCPN 
ADPI_MYCPN adhesin PI precursor homolog; MYCPN 
ADP1_MYCPN adhesin PI precursor homolog; MYCPN 
ADP1J4YCPN adhesin PI precursor homolog; MYCPN 
ADP1.MYCPN adhesin PI precursor homolog; MYCPN 
ADP1.MYCPN adhesin PI precursor homolog; MYCPN 
hypothetical 28K protein (PI operon) homolog; MYCPN 
hypothetical 130K protein homolog (orf6, PI operon); MYCPN 
hypothetical I30K protein homolog (orf6, PI operon); MYCPN 
hypothetical 130K protein homolog (orf6, PI ppcron); MYCPN 
hypothetical 130K protein homolog (orf6, PI operon); MYCPN 
hypothetical 130K protein homolog (orf6, PI operon); MYCPN 
hypothetical 130K protein homolog (orf6, PI operon); MYCPN 
hypothetical I30K protein homolog (orf6, PI operon); MYCPN 
hypothetical 130K protein homolog (orf6, PI operon); MYCPN 
hypothetical I30K protein homolog (orf6, PI operon); MYCPN 
hypothetical 130K protein homolog (orf6, PI operon); MYCPN 



• RNA - tRNA [33 tRNAs In 14 genes/ope rons] 
Arg-tRNA gene (CGA); MYCPN 

Arg-tRNA gene (CGC); MYCPN 
Arg-tRNAgene (AGA); MYCPN 

Asn-tRNA(AAC), Glu-tRNA(GAA), Thr-tRNA(ACG). Val-tRNA(GTA), Thr-tRNA(ACA), Lys-tRNA(AAG). Leu-tRNA(CTA) genes; 
MYCPN 

Cys-tRNA(TGC), Pro-tRNA(CCA), Met-tRNA(ATG), Ile-tRNA(ATG), Ser-tRNA(TCA) ( fMet-tRNA(ATG). Asp-tRNA(GAC) and Phe- 

tRNA(TTC) genes; MYCPN 
Gly-tRNA(GGC) gene; MYCPN 
His-tRNA(CAC) gene; MYCPN 
Ile-tRNA(ATQ, Ala-tRNA(GCA) genes; MYCPN 
Thr-tRNA(GGU) gene; MYCPN 
Ser-tRNA (AGQ gene; MYCPN 
Ser-tRNA(TCC), Ser-tRNA(TCG) genes; MYCPN 
Trp-tRNA (TGA)gene; MYCPN 
Trp-tRNA(TGG) gene; MYCPN 

Tyr-tRNA (TAC), Glu-tRNA (CAA), Lys-tRNA (AAA), Leu-tRNA (TTA), Gly-tRNA (GGA) genes; MYCPN 

• RNA - other [3] 
4.5S RNA; MYCPN 
lOsa RNA; MYCGE 
RNaseP RNA; MYCGE 
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mrniot 
iviuiyi 
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R02_orf347L 


MG191 




\Ani7t 
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MGI92 


R02_orf301 




R02_brfl73 


MG192 


H08_orf445 


MG192 


P02_orf381 


(MG192) 


H9l_orf322 


MG192 


H91_orf272 


MG192 


• RNA - rRNA [3] 




5S rRNA 




I6S rRNA 




23S rRNA 





I 

.'I- 



MG is the name of the corresponding ORF in M.genitalium (9). 



coding densities have been also estimated for the smaller M.genita- 
lium genome (9) and for the genome of Haemophilus influenzae 
which is more than twice as large (30). The length of the proposed 
proteins in M.pneumoniae ranges from 37 (4.3 kDa) to 1882 (209.4 
kDa) amino acids (Fig. 3). One of the largest proteins is the 
cytadherence accessory protein HMW2 (F10_orfl818) and the 
smallest identified protein is the 37 amino acid ribosomal protein 
L36 (GT9_orf37). For practical reasons we introduced at the 
beginning of the sequence analysis a cut-off point of 100 amino 
ficids for proposed proteins unless we found smaller proteins such 
Jjs some of the ribosomal proteins during the initial BLASTX 
homology search. All intergenic or non coding regions were 
ijpanalyzed with a cut-off point of 50 amino acids and searches were 
Gone for specific, small proteins. However, we cannot exclude the 
Possibility that some of the smaller proteins, not showing similarities 
to known proteins from other organisms, have been missed in our 
analysis. 

The co* ion usage of M.pneumoniae is summarized in Table 3. We 
jfcnipared it for all proposed genes, for the subsets of genes with a 
*°w G+C (content below 35 moI%) and high G+C content (between 



50 and 56 mol%) and for all 50 ribosomal protein genes (42.8 mol%) 
as an example for frequendy translated genes. Codon usage of the 
low and high G+C content subfractions is clearly influenced by the 
DNA composition, favouring either codons with G/C or A/T at the 
third position. The codon usage pattern differs also for the complete 
genome and for genes which are frequently expressed like the ones 
coding for ribosomal proteins. 

The most frequently used codons are AUU (lie, 4.6%); AAA 
(Lys, 4.6%); UUU (Phe, 4.3%); GAA (Glu, 4.2%) and UUA 
(Leu, 3.9%) and the most common amino acids are Leu (10.3%), 
Lys (8.5%), He (6,6%), Ala (6.6%) and Val (6.5%). The high 
value for Lys is in agreement with the relative high percentage of 
proposed proteins with calculated isoelectric points between pH 
9 and 12 (Fig. 4). The least frequently used codons are UGC (Cys, 
0.2%); CGA (Arg, 0.25%); AGG (Arg, 0.29%); AGA (Arg, 
0.4%) and UGU (Cys, 0.55%). 

All M.pneumoniae gene products were classified (Table 1 and 2), 
with some minor modifications, in accordance with criteria 
introduced for Escherichia coli (31) and adapted for the 
classification of putative genes from H. influenzae. We added 
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Table 2. Summary of the functional classification of the ORFs 



• Btoflynthefb of cofftctort, prosthetic froopc and carrier 


• 
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Folic add 


5 


Heme and porphyrin 


1 


Thioredoxln 


2 


• CeU eoTelopc 


54 


Membranei. lipoproteins and porioes 


42 


Surface structures and cyUdhereoce 
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Surfaces polysaccharides, Upopolytacchazidea and antigens 
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• Cellular proccaaci 


20 


Cell division 


2 


Cell killing 


1 


Chaperones 
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Detoxification 
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Protein and peptide secretion 
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• Central Intermediary metabolism 
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Other 
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Phosphorous compounds 
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• Energy metabolism 


39 


Aerobic 
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Amino acids and amines 
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Anaerobic 
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ATP-proton motive force interconvenion 
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Glycolysis 


10 


Pentose Phosphate pathway 


2 


Pyruvate DHase 
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• Fatty add and phospholipid metabolism 
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18 


T-DeoyriboQucleotide metaboUsm 
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Nucleotide and nucleoside IntercoQversions 
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Purine ribonucleotide biosynthesis 
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Salvage of nucleosides and nucleotides 


g 


Sugar-nucleotide biosyntbesis and conversions 


2 


• Pyridine nucleotide nMtaboUsm 


1 


• Regulatory function 


g 


• Replication 


AA 


DNA replication, restriction, modification, recombination and repair 


46 


• Transcription 




Degradation ofRNA 


2 


RNA synthesis, modification and DNA transcription 


11 


• Translation 


99 


Amino acyl tRNA synthetases and tRNA modification 


24 


Degradation of proteins, peptides and glycopeptides 


8 


Protein modification and translation factors 


15 


Ribosomal proteins: synthesis and modification 


52 


■ Transport and binding proteins 


44 


ABC transport 


34 


PTS transport 


7 


Other transport systems 


3 


• Other categories 


191 


Adaptations and atypical conditions 


3 


Other 


188 


• hypothetical ORFs derived from repetitre DNA elements 


46 


* no classification so far 


86 


•RNA 


39 


rRNA 


3 


tRNA 


33 


other 


3 



'cytadherence associated proteins' to the category of cell 
envelope-surface structures, since evidence is mounting, that 
M.pneumoniae possesses a cytoskeleton-like organization which 
stabilizes the bacterium and protects it against osmotic lysis (2). 
The category of transport and binding proteins was altered by 
subdivision into three groups namely, into PTS-, ABC- and other 
transport systems. To facilitate the orientation on the gene map we 
added a list which contains all proposed ORFs and RNAs in 
nummerical order (Table 4). 

More details on this very general analysis will be made public on 
the www (http://www.zmbh.uni-heidelberg.de/M_pneumoniae). 



Number 




27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 % 



Figure 2. Distribution of the G+C content of the coding sequences of all 
M.pneumoniae ORFs. 



DNA replication and repair \ 

The central enzyme for DNA replication in bacteria is the DNA \ 
polymerase HI holoenzyme (32), which consists of 10 subunits in ■{ 
Kcoli, a DNA polymerase subunit a and nine accessory proteins (e, jj 
\), x, y, 8, 5', X, \|/ and (i). Mycoplasma pneumoniae codes for two S 
potential a subunits (the gene name in the literature is either dnaE i 
or polC). Both proposed a subunits, A19_orf872 and BO l_orf 1443, j 
differ in length and also in their degree of similarity to the a subunits j 
from Kcoli and Bacillus subtilis. The protein from B01_orfl443 1 
shares the highest similarity with the a subunit from Gram-positive | 
bacteria including the motif for a 3-5' exonuclease activity which | 
is typical for these bacteria. In contrast, the orf A19_orf872 is most ]g 
similar to the a subunit from Kcoli and does not contain a 3-5' | 
exonuclease domain. The 3-5' exonuclease activity in Kcoli is | 
encoded by a separate gene (dnaQ), which has not been found in j 
M.pneumoniae. Of the other subunits which build the DNA \ 
polymerase HI holoenzyme in Kcoli (32) only the subunits P ; 
(dnaN), 5'(holB), y and t (dnaX) are present in M.pneumoniae, j; 
indicating a simplified replication complex compared with the ^ 
Gram-negative bacteria Kcoli and H.influenwe. Presendy, it cannot ; 
be excluded that other proteins replace these subunits in M.pneumo- 
niae. A true comparison with a phylogenetically closer related ^ 
Gram-positive bacterium like B.subtilis is not possible since the 
Bacillus DNA polymerase HI holoenzyme complex has not been ; 
defined as yet and the nucleotide sequence of the entire B.subtilis 
genome has not been completed. 

Mycoplasma pneumoniae does not code for a DNA polymerase 
I (polA)-like DNA repair enzyme. Instead, we find a truncated 
polA gene (A19_orf291) comprising only the 5-3' exonuclease 
domain, whereas in Kcoli and B.subtilis the polA gene is much 
larger and codes for the 5-3' exonuclease and a 5-3' polymer- 
ase-specific domain. 

Experimental results on DNA polymerase enzymatic activities 
in mycoplasmas are confusing. It was claimed that the DNA 
polymerase III of Mollicutes lacks the 3-5' exonuclease proof- 
reading activity in general (33) and this was taken as & 
explanation for the observed genetic instability of many MolUcu& 1 
species (4). Recently, the nucleotide sequence of the polC gene of | 
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Rgure 3. Distribution of all M.pneumoniae proteins according to their molecular weight. 



Mycoplasma pulmonis and experimental results on enzyme 
purification and characterization of enzyme activities were pub- 
lished (34). The results indicated that the polC gene from 
M.pulmonis also codes for a 3-5' exonuclease, and that the size of 
the predicted PolC protein, 1435 amino acids, is very similar to the 
PolC homolog B01_orfl443 in M.pneumoniae and that the 
polymerase could be inhibited by compounds specific for PolC 
proteins of Gram-positive bacteria. Furthermore, the authors 
provided some experimental evidence for a second, smaller 
enzyme with DNA polymerase activity. Considering the charac- 
terization data of DNA polymerase activities in M.pulmonis and 
the nucleotide sequence data on DNA polymerase genes of 
M.pneumoniae and M.genitalium (9,35), one can conclude that at 
least these three Mycoplasma species have two DNA polymerase 
(polC) genes coding for a larger protein 0=1400 amino acids) with 
a 3-5' exonuclease activity and with the highest sequence 
similarities to the Gram-positive B.subtilis polymerase HI. There- 
fore it is unlikely that an increased mutation frequency is caused by 
the DNA replication process. The nucleotide sequence of the 
smaller Pol HI homolog (=100 kDa) of M.pneumoniae and 
M.genitalium (9,35) resembles more the polC gene from the 
Gram-negative E.coli. This is also emphasized by the absence of 
the 3'-5' exonuclease domain in the proposed genes. The gene for 
the smaller, Gram-negative typical PolC has not yet been found in 
M.pulmonis, but during the purification of the larger PolC, a second 
polymerase activity lacking exonuclease activity has been identi- 
fied. The function of the exonuclease negative DNA polymerase 
can only be elucidated experimentally and it remains to be seen if 
it can substitute for the function of the polymerase I (PolA) in 
combination with the proposed 5-3' exonuclease of the truncated 
PolA gene (A19_orf291). This topic has been also discussed for 
M.genitalium (35). 

In addition to the DNA polymerase many more gene products 
are necessary for DNA replication, e.g. initiation, elongation and 
termination (32). The most obvious functions missing in 
M.prmumoniae according to the sequence analysis are an RNaseH 
for primer removal and a protein for the termination of 
replication. 



The number of genes involved in DNA repair is considerably 
smaller in M.pneumoniae than in the 'standard' eubacteria E.coli 
and B.subtilis or even H.influenzae with the smaller genome. 

Mycoplasma pneumoniae codes only for 13 of the genes known 
to be involved in excision repair of DNA, recombination and SOS 
repair. Thus the genes recB, recC, recD, recG and ruvC involved in 
recombination are missing as well as the genes recN, recO, recQ and 
recR involved in SOS repair in ExolL Nevertheless, a rudimentary 
stock of enzymes has been conserved in M.pneumoniae to permit 
homologous recombination [RecA, Ssb, PolA (see above), GyrA, 
GyrB, RuvA and RuvB] (36), excision repair (37) and a kind of 
truncated SOS repair (38). In particular missing is the lexA gene 
which plays a central role in regulating the SOS response including 
the expression of the recA gene in other bacteria 

We were also unable to find components of the so called 
mismatch-repair system encoded by the mutS, mutL and mutH 
"genes. Since bacteria which normally carry the mut genes show 
a reduced genetic stability, if these genes are mutated, it seems 
likely that the absence of these genes in mycoplasmas causes an 
increased mutation rate (65). 

Transcription 

The DNA dependent RNA polymerase of M.pneumoniae is 
coded by the conserved genes rpoA (a subunit), rpoB (P subunit), 
rpoC (P' subunit) and rpoE (5' subunit). The only sigma factor 
found (H91_orf499) shares the highest similarity with the sigma 
factor SigA from B.subtilis (39). Presently, not enough experimental 
data are available for defining promoter sequences in M.pneumo- 
niae. The promoter of only three genes/operons have been 
determined experimentally by primer extension. These genes are 
the PI operon (14), the ribosomal RNA operon (40) and 
F10_orf405 (27). The -10 region and to a lesser extent the -35 
region of these three examples are comparable with consensus 
promoters sequences in B.subtilis (41). Termination of transcription 
seems to be independent of the termination factor Rho, since the 
corresponding gene could not be found. Transcription stops on 
typical terminator sequences which are short interrupted palin- 
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Table 3. Codon usage of different sets of M. pneumoniae ORFs: all 677 
ORFs; ORFs with a G+C content <35 mol%; codon usage of the 
adhesin PI and ORF6 (high G+C content); ribosomal ORFs as 
examples for frequently expressed proteins 
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dromic regions followed by a run of U residues. The Nus 
transcription termination factors, of which NusA (E07„orf540) 
and NusG (D09_orf320) are present, may play a role in the 
termination of transcription. NusB and NusC are absent. NusA is 
involved in termination and NusG in antitermination in other 
bacteria. Finally, GreA promotes elongation by the RNA 
polymerase by utilizing a novel transcript-cleavage reaction (42). 
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Figure 4. Distribution of all M. pneumoniae proteins according to their 
predicted isoelectric point (IP). 



Gene expression and regulation 

Regulation of gene expression in M. pneumoniae has not been 
studied so far. Therefore we do not know how this bacterium 
coordinates the synthesis of those gene products which are 
essential for reproduction. Also, M.pneumoniae has to sense and 
respond to environmental changes. This requires a signal 
transduction system. The presence of only one sigma factor 
(sigA, H91_brf499) which is also the only one of all proposed 
proteins showing the characteristic helix-turn-helix (HTH) 
motif, suggests that the response to external stimuli is not 
controlled by the level of expression of alternative sigma factors/ 

The presence of a exacting conserved palindromic repeated 
sequence in front of four heat shock genes, similar to the 'CIRCE 1 
element first identified in B.subtilis (43) and the identification of 
the proposed repressor (C09_orf351, hrcA), indicates that the 
heat shock response in M.pneumoniae is regulated by the 
interaction of this repressor with the CIRCE element, and 
provides an example for a negative regulation of gene expression 
in M.pneumoniae. 

The two-component signal transduction system (44), consisting 
of a sensor and a response regulator, which has been found in 
many prokaryotic and eukaryotic organisms is believed to be 
essential for all cells. Nevertheless, based on sequence similarity 
we were unable to detect any such system in M.pneumoniae. 

Concerning other proteins with regulatory functions we 
identified several GTP-binding proteins and other proteins like 
the virulence associated protein vacB (K04-orf726). These 
regulatory proteins act by unknown mechanisms. 

IVanslation 

The translation machinery of M.pneumoniae is rather extensive. 
About 15% of all proposed ORFs, are involved in translation 
including 19 tRNA synthetases, 50 ribosomal proteins, various 
factors and enzymes, 33 tRNAs, one ribosomal RNA operon with 
one copy of each 5S, 16S and 23S rRNA (45), and a gene coding 
for the lOSa RNA. The conservation of the lOSa RNA which 
functions as tRNA and mRNA and is implicated in fm/w-translation 
(66), is interesting in evolutionary terms. Three exceptions are 
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Table 4. List of the proposed ORFs, RNAs and REPs in numerical order starting with E07_orf540o on the gene map (Fig. 1) 
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080 
081 
082 



Genome Position 

663"S15435 (re!) 

4081. .740 

6641 ..4257 

7325..6924 

8482..7808 

8620..7896 

9614..8310 

I0589..10I67 

12589..11132 

13393.. 1 2596 

14250.. 1 37 11 

15843..I4602 

16274.. 14754 

16944..16417 

20717..17061 

20717..18017 

23560..21760 

25606..20723 

25606..24060 

26593..25619 

26823-27091 

26B44..27335 

2757Z.28072 

28321-29007 

30544..29585 

3 1505. JOS 16 

33258..31498 

34187-33282 

35192-36457 

35415..34645 

36396-35731 

37389..37148 

37422..37000 

38383-37821 

38832. .38383 

39981. .39532 

40650..39538 

41980..4143B 

42851. .42372 

44647.-42887 

44679..45734 

48O90..45721 

49997..4S090 

50032..50105 

50488..50123 

51141..50488 

53896..5 U64 

54231..54662 

55020.. 54637 

55210..55031 

55821..55216 

57713-55911 

58374..57703 

593I5..58923 

61443-60175 

64103..61947 

64524..64027 

66418..65204 

67175..66420 

69705..67288 

70733..69708 

7 1881-71 567 

71891..72409 

73896..73078 

74668.-72883 

75998.-747 1 2 

76039-74736 

76973.-7669 1 

77006..76455 

78388-77345 

79072-77697 

79517..79074 

81440..79815 

824I0..81616 

83174..82410 

83460..83358 

86408..83682 

88 1 55-86632 

90I77..89755 

90202-89903 

91516..90611 

91892..91371 

92626-922 10 

92692-90643 

93692.-92703 

94854.-93847 

95651. .95346. 

97118-96666 

97607..97290 

99191..97869 

100872-99298 

102523..100922 

104479..102533 

105897..1O45O0 

110057.. 105897 

111 196..110294 

1 13273-1 11189 

113324-113412 

1 13856..1 15265 

U547I-1I7I65 

118I16..1 17217 

1 18123 .1 18566 

118373-119539 



Nunc 

E07_orf540o 

E07_orflll3 

E07_orf794 

E07_orfl33 

E07_orf224 

REPMP5 

E07.orf434 

E07_orfl40 

E07_orf485 

E07_orf265 

E07_orfl79 

E07„orf413 

REPMP273 

E07_orfl75 

E07„orfl218 

REPMP5 

REPMP2/3 

E07_orfl627 

REPMP4 

E07_orf324 

REPMP1 

E07_ocfl63 

E07_orfl66 

E07_orf228 

E07_orf319 

E07_ori329 

E07_orf586 

E07_orf301 

REPMP2/3 

E07_orf256L 

E07_orf221V 

REPMP1 

O09_orfl40o 

REPMP2/3 

C09_orfl49b 

C09_orfl49a 

REPMP4 

C09 off 1 80 

C09_orfl59 

O09_orf586L 

C09_orf351 

C09 orf789 

O09_orf635 

mptgt 

O»_orfl21 

C09_orf217 

CO9_orf9l0 

C09_orfl43b 

C09_orfl27 

C09_oif59 

C09_orf201 

C09 off 600 

C09_ccf223 

CO9_orfl30b 

C09_orf422 

C09_orf7!8 

C09_ocfl65 

C09_orf404 

C09_orf25I 

C09_orf805 

O09_orf34I 

C09_orflO4 

C09_orfl72 

C09_orH72 

REPMP5 

C09_orf428V 

REPMP4 

REPMP1 

R02_orfl83o 

R02_orf347L 

REPMP2/3 

R02_orfl47 

R02_orf54l 

R02_Off264 

R02_orf254 

5$rRNA 

23* rRNA 

16srRNA 

R02_orfl40 

REPMP1 

R02_orf301 

R02_orfl73 

R02_CffI38 

REPMP5 

R02_orf329 

R02_orf335 

R02_orfl01 

R02_orfl50 

R02_orfl05 

R02_orf440 

R02„orf524 

R02_orf533 

R02_orf64S 

R02_arf465 

R02„orf 1386V 

R02_orf300 

R02_orf694 

mptgsb 

R02_orf469 

R02_orfS64o 

D09_orf299 

D09_orfl47 

D09_orf388 



Annotation 

N-mili ration jubiunce protein A bomolog (niuA); BACSU 

MGI40 homotog. MYCCE 

putative lipoprotein, MG260 bomolog. MYCGE 

hypothetical 130K protein homotog (orf6. PI operoo); MYCPN 
repetitive DNA sequence REPMP5 

hypothetical I30K protein homotog (orf6, PI operon); MYCPN 

MG260 bomolog. MYCGE 
MG260 bomolog. MYCGE 

ADP1.MYCPN adhesin PI precursor bomolog: MYCPN 
repetitive DNA sequence REPMP2/3 

. hypothetical 130K protein (orf6; PI operon); MYCPN - 
repetitive DNA sequence REPMPS 
repetitive DNA sequence REPMP2/3 
ADP1_MYCPN adhesin PI (orfS. PI operon): MYCPN 
repetitive DNA sequence REPMP4 
hypothetical 28K protein (orf4, PI operon); MYCPN 
repetitive DNA sequence REPMP1 



sn-glycerol-3-pbosphate transport system permease protein (ugpE); ECOU 
imglycerol-3- phosphite transport system permease protein (ujpA); ECOU 
sn-{lycerol-3-phosphate tnuuport system permease protein (ugpQ; ECOU 
putative lipoprotein. MG 1 86 homoloj. MYCGE 
repetitive DNA sequence REPMP2/3 
ADP1J4YCPN adhesin PI precursor bomolog: MYCPN 
ADP1_MYCPN adhesin PI precursor hocnoloj; MYCPN 
repetitive DNA sequence REPMP1 

' repetitive DNA sequence REPMP2/3 
adhesin PI ({roup 2) homotog; MYCPN 



repetitive DNA sequence REPMP4 

MG207 horaolog, MYCGE 

excinuclease ABC subunit C (uvrQ. BACSU 

protein ChrcA) bomolog, BACSU 

topoisomerase IV subunit A (parC), BACSU 

topoisomeme IV subunit B (parE), BACSU 

Thr-tRNA(GGU) jenc; MYCPN 

MG202 homoloj, MYCGE 

beat shock protein GrpE. HAEIN 

Dnal bomolog protein, MYCCA 

MG 199 bomolog, MYCGE 

ribosomal protein L20 (rpl20); MYCFE 

ribosomal protein L33 (rpL35); BACST 

translation initiation factor IF3 (tnfC); MYCFE ' 

carnitine palmitoyltransferase II precursor(cp(2); HUMAN 



phenylalanyl-tRNA synthetase beta chain (phcT); BACSU 
phenylalanyl-lRNA synthetase alpha-subuntt (pbeS); BACSU 
(MG191 bomolog. MYCGE) 

hypothetical 130K protein homotog (orf6. PI operon); MYCPN 

repetitive DNA sequence REPMPS 

ADP1_MYCPN adhesin PI precursor bomolog; MYCPN 

repetitive DNA sequence REPMP4 

repetitive DNA sequence REPMP1 

ADPI_MYCPN adhesin PI precursor bomolog; MYCPN 

repetitive DNA sequence REPMP2/3 

MG260 bomolog. MYCGE 

putative lipoprotein. MG260 bomolog. MYCGE 



iSrRNA 
:3SrRNA 
6SrRNA 

repetitive DNA sequence REPMP1 

lypothetical 130K protein bomolog (orf6, PI operon): MYCPN 
lypotbetical 130K protein bomolog (orf6, PI operon); MYCPN 

repetitive DNA sequence REPMPS 

type I restriction enzyme ccokl specificity protein (hsdS) bomolog: HAEIN 



MG068 homotog, MYCGE 

putative lipoprotein, MG067 bomolog, MYCGE 

transketolase 1 rJX 1; (ktB); RHOSH 

gluumine tranipon ATP-btnding protein (glnQ); ECOU 

MG064 homotog, MYCGE 

1-poospbofructokinase (fruK): HAEIN 

fructose-permeiM IIBC component (fruA); ECOU 

Ser-tRNA gene (AGO: MYCPN 

MG061 bomolog. MYCGE 

hexosepfaotphate transport protein (uhpf); SALTY 

hypothetical protein (ywdF) bomolog: BACSU 

bypothetical protein (A43259) bomoloj; ENTHR 

ptaipboribosylpyropbosphate synthetase (pre): SYNP 



t 
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Table 4. Continued 



083 
084 
085 
086 
087 
088 
089 
090 
091 
092 
093 
094 
095 
096 
097 
098 
099 

100 

101 

102 

103 

104 

105 

106 

107 

108 

109 

110 

111 

112 

113 

114 

115 

116 

117 
118 
119 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 

130 

131 

132 

133 

134 

135 

136 

137 

138 

139 

140 

141 

142 

143 

144 

145 

146 

147 

148 

149 

150 

151 

152 

153 

154 

155 

156 

157 

158 

159 

160 

161 

162 

163 

164 

165 

166 

167 

168 

169 

170 

171 

172 

173 

174 

175 

176 

177 

178 

179 

180 

181 

182 

183 

184 



11 95 18.. 120054 
120036..1 20866 
120853.. 121 236 
121404.121781 
121789..122751 
124383-122719 
124774.. 1 24373 
126050..1 24785 
1267 11. .126037 
I27431..126715 
127487..128839 
130278.. 129 127 
131221..130262 
132678..131221 
I33523..I32663 
134376.. 133516 
136060..1 34378 
137837 ..137466 
139642..1 39376 
141 633..139660 
141816..142970 
142961..144487 
146845.. 144947 
I48578..I47022 
150522..149167 
152 17 1. .150498 
I53387..152143 
1534 14.. 153989 
154830..1 54036 
157172..155154 
157794..1 57234 
158048..1 58359 
159270..158254 
159672..t60020 
160267..160532 
160694..160251 
162883.. 1 60862 
165055..163055 
165333..169664 
169788..170324 
170328..170654 
171489..170878 
I71995..171489 
172485..171913 
173405..172506 
I73438..174262 
175353..174265 
176220..1 75354 
176660..176220 
178219..176681 
179148..178219 
1803O4..179132 
183442..1 80350 
185356..183452 
187139..185268 
187233..187390 
187475..1 88284 
188259..189125 
189 125..1 89982 
I90597..189959 
191472..190699 
192199..192906 
19293 1..193626 
194207..193812 
195189..194404 
1965 17.. 195 189 
197280..196519 
197885..197253 
199152..197890 
201643..199124 
203595.101643 
204626.103697 
205772.104630 
206520..207332 
207319.108071 
20807 1..209390 
209458.110312 
210318.115966 
215968..216987 
217010..217156 
217146.117502 
217483.118640 
218633.119424 
219411.120865 
220846..222I23 
223000.122680 
223391.123696 
225039.124101 
225210.125719 
225719.126246 
226427.128556 
229109.130146 
231385.130186 
231411.131833 
232705.131830 
233448.132693 
233533.134717 
234876.135589 
235596.136300 
236264.136719 
236870.138369 
238451.138717 
238783.139415 
239399.139758 



D09_orfl78 

D09_orf276 

D09_orfl27a 

D09_orfl25 

D09_orf320 

D09_orf554 

D09_orfl33 

D09_<xf42l 

D09_ort224 

D09_orf238 

D09_orf450 

D09_orf383 

D09_orf3l9 

D09_0ff485 

D09_orf286i 

D09_orf286b 

D09_orf560L 

D09_orfl23 

D09_orf88 

D09_orf657 

D09_orf384 

D09.off508 

D09_orf632 

D09_orf518 

D09_orf45l 

D09_orf557 

B01_Off414o 

B0l_orfl9l 

B01_crf264 

B01_orf672 

B01_otfl86L 

B01_orfl03b 

B01_or038 

B01.orfU6L 

REPMP1 

B01_CffU7 

B0l_orf673 

BO I orf 666 

B01_orfl443 

B01_orfl78 

BOl.orflOS 

BOl_orf203 

B01_orfl68 

B0l_orfl90 

B01_orf299V 

B01_orf274 

B01_crO62 

B01„ori288 

B01 ©rfl46 

B01_orf512 

B0l_orf309 

D12jorf390o 

Dl2jorfl030 

D12_orf634 

D12_orf623 

mptgi 

D12_orf269 
Dl2_orf288 
Dl2_ort285 
D12_orf212 
D12_orf257 
Dl2_orf235 
D12 orf231 
Dl2_orfl31 
Dl2_orf261 
D12_Off442 
D12_orf253 
D12_otf210 
D12_orf420 
K05_orf839o 
K05_orf650 
K05_orf309 
K05_orf380 
K05_orf270 
K05_orf250 
K05_orf439 
K05_orf284 
K05_orfl882 
K05_off339 
K05_ocf48 
K05_oifll8V 
K05_orf385 
K05.orf263V 
K05_orf484 
K05_orf425 
K05_orfl06 
K05_orfl0la 
K05_orOl2 
KQ5_orfl69 
K05_orfl75 
K05_orf709 
K05_OffM5 
K05_orO99 
KQ5_orfl40 
K05_orf29I 
K05_orf251 
KQ5_orf394 
KQ5_orf237 
K05_orf234 
K05_orfI51 
K05_orf499 
KQS.orfSS 
K05_orf2l0 
K05_orfll9 



hypothetical protein (yabF) homolog; BACSU 
hypothetical protein (yabO homolog; BACSU 

MG055 homolog, MYCCE 
transcription intiterminatioa factor (nusG); BACSU 
phoiphomannomutase (cpsG); MYCP1 
cytidine deaminase {cdd); MYCPI 
thymidine phosphorylase (deoA); MYCPI 
deoxyribose-phosphate aldolase (deoQ; MYCPN 
purine-nucleoside phosphoryUie <deoD); ECOU 
signal recognition particle protein (flh); MYCMY 
S-adenosytmethionine synthetase 2 (metX); ECOU 
o-sialoglycoprotein eivdopeptidase (gep): PASHA 
putaUve lipoprotein. MG045 homolog. MYCCE 
ipermidineAwttescine transport system permease (poll); ECOU 
ipermidine/putrcscine transport system permease (poiB); HAEIN 
jpermidine/putrescine transport ATP-binding prot (potA); ECOU 
putative lipoprotein 

pbosphocarrier protein HPr (puH); MYCCA 

putative Upoproiein, MG040 homolog. MYCCE 

aerobic glycerol- J-phospate dehydrogenase (glp°>*. EC° U 

glycerol kinase (gtpK). HAEIN 

MG288 bomolog. MYCGE 

MC096 homolog. MYCCE 

pre-B cell enhancing factor homoloj (pbcF); HUMAN 
aspanyl-tRNA syntheiasc (aspS); THEAQ 
histidyl-tRNA synthetase (hisS); STREQ 
thymidine kinase (tdk); BACSU 
glycerol uptake facilitator (gtpF); BACSU 
MC032 homolot. MYCCE 
MG032 homolog. MYCCE 

MC032 homolog. MYCCE 

repetitive DNA sequence REPMP1 

MG032 homolog. MYCCE 

MG032 homolog. MYCCE . 

DNA polymerase in (dniE) alpha chain 0 -5* exonuclease); BACSU 

uracil phwrJwxibosyttrtnsferase (npp); STRSL 

hypothetical protein <gi: 606093) homolog; ECOU 

MG028 homolog. MYCGE 

MG027 homolog. MYCCE 

elongation factor P (efp) homolog; HAEIN 

TrsB protein; YEREN 

hypothetical protein (yyaF) homolog; BACSU 

fructose-bisphosphate aldolase (tsr); BACSU 

DNA-directed RNA polymerase delta subunit (rpoE); BACSU 

methionyltRNA synthetase (metS); BACST 

proline iminopeptidase (pip); NEIGO 

beat shock protein Dnal; BACSU . 

hypothetical helicase (yb95> homolog; YEAST 

transport ATP-binding protein (msbA); HAEIN 

transport ATP-binding protein (pmdt): SCHPO 

lle-tRNACATQ, Ala-tRNA(CCA) genes; MYCPN 

5.10-methylene-tetranydrofolaic dehydrogenase (mujlfc HAEIN 

ribosomal protein S6 modification protein (rimK); ECOU 

MG01 1 homolog, MYCGE 

DNA primase motif (dnaG): CLOAB 



putative lipoprotein 

hypothetical protein (yabD) homolog; BACSU 

possible ihiophene and furan oxidation protein (tdhF); BACSU 

DNA polymerase FH subonii delta* (holB); ECOU 

thymidytite kinase (CDC8) homolog. MYCGE 

seryl-tRNA synthetase (serS); BACSU 

DNA gyrase subunit A (gyrA): STAAU 

DNA gyrase subunit B (gyrB); MYCPN 

Dnal homolog protein; YEAST 

DNA polymerase HI beta wbunit (dnaN); STAAU 

protein (soj) homolog; BACSU 

chromosomal replication initiator protein (dnaA); MYCCA 
sulfate transport ATP-binding protein (cysA); SYNP 

protein (devA) bomolog: AN ASP 
ribosomal protein L34 (rpU4); PROM I 
RNaseP C5 chain (rnpA); MYCCA 
hypothetical protein 1 (S42122); MYCCA 

sSS5«*^ dimethyttransferase (ksgA); ECOU 

glutamyl-iRNA synthetase (gttX); BACST 
MG461 bomolog, MYCGE 

L-lactate dehydrogenase fldh); MYCHY 

hypothetical protein (HI0671) homolog: HAETN 

bypoxanthine-guanine pbcspboribosyltraiufeme (hpt); LACLA 

cell division protein (fuH); BACSU 

MG456 bomolog. MYCGE 

tyrosyl tRNA synthetase (tyrS); B ACCA 

osmoucaUy inducible protein (osroQ; ECOU 

UDP-glucose pyiophosphorylase (gtaB); BACSU 

MG452 bomolog, MYCGE 

elongation factor TU (tuf): MYCGE 

homolog (degV) protein; BACSU 

MG449 bomolog. MYCGE 

pUB bomotog (fragment); HAEIN 

MG447 homolog. MYCGE 

ribosomal protein S 16 (BS 17); BACSU 

tRNA (guanine-NlHnethyliransferase (trmD); HUMAN 

ribosomal protein L19 (rpU9); BACST 



'fftlt*. Continued 



185 


239774.240979 


K05_orf401 


IS6 


240948..241763 


K05_orf27l 


187 


242850..242236 


E09_orf204o 


IS8 


243127..243516 


E09_orfl29 


189 


244320..243889 


E09_orfl43V 


190 


245395..244301 


E09_orf364 


191 


246521 ..245382 


E09_orO79 


192 


247519.^47824 


E09_orflOI 


193 


247809..248219 


E09_orfl36L 


194 


249I06..249516 


B09_orfl36 


195 


249627..250499 


E09_orf290 


196 


250522..25I355 


E£9jxtm 


197 


251355.252206 


E09_orf283i 


198 


252209..253060 


B09_orf283b 


199 


252981. .253889 


E09_crf302 


200 


253889..2547S2 


E09_orf279 


201 


254731. .255561 


E09_orf276 


202 


255561. .256463 


B09_orf300 


203 


256471. .257334 


E09_orf287o 


204 


258458. .257331 


E30_orf375 


203 


259665..2S8478 


E30_orf395 


206 


260219..259665 


E30_orfl84 


207 


261354..260296 


E30_orf352 


208 


262455..261910 


Ct2_orfl81o 


209 


263280.^62537 


CI2_orf247 


210 


264090..263383 


Cl2_Off235 


211 


26498S..264092 


C12_orf298 


212 


265075..266289 


C12_orf404 


213 


266342..267076 


C12_ort744 


214 


267069..268595 


C12_Off508 


215 


268600.^70318 


C12 ocf572 


216 


270833..270315 


C12_orfl72 


217 


271393.^70968 


CI2_crfl4I 


218 


271634.^71437 


C12_Off65 


219 


273008..271656 


C12_orf450 


220 


273166..273426 


Cl2_orf86 


221 


273431..275116 


CI2_orf561 


222 


275162.390313 


C12_orf839 


223 


277659..2805O5 


C12_orf948L 


224 


2805I4..282559 


Cl2_orf681 


225 


282590.^83030 


C12_ocfl46 


226 


2S3036..283434 


C12_ocfI32 


227 


2S3864..284613 


C12_orf249 


228 


284 699.^85703 


C12_or034 


229 


285639..2S6673 


C12_orf344 


230 


286788. .289781 


CI2_orf997 


231 


290023..291180 


C12_0ff385 


232 


291180.^93135 


Cl2_orf65lV 


233 


293120.394109 


C12_ocf329 


234 


294112.394789 


CI2_orf225 


235 


295259.^94786 


Gl2_orfl57 


236 


295314.396684 


Cl2_orf456 


238 


297129.398010 


C12_orf293o 


237 


297163..296690 


C12_orfI57L 


239 


298013..29833O 


DO2_orfl05 


240 


298333.398956 


D02_orf207 


241 


298949.399485 


D02_orfl78 


242 


299488-30I044 


D02_orf518 


243 


301044. 301 883 


D02_orf279 


244 


301883.303310 


D02_orf475 


245 


303313.303714 


D02jorfl33a 


246 


303714.305423 


D02_orf569 


247 


305423.305881 


D02_orfl52 


248 


305799.306167 


D02_orfl22a 


249 


306393.306761 


D02_orfI22b 


250 


306S62.30S427 


D02_orf521 


251 


308950.310011 


D02_orf353V 


252 


310168.310821 


D02_«f217L 


253 


3 10962. J 11435 


D02_orfl57L 


254 


311648.313243 


D02jorf531 


255 


3I3301..3I3753 


D02_orfl50 


256 


3 13629. J 14672 


D02_orf347 


257 


314746.315654 


D02_orf302 


258 


315716.316123 


D02_ocfl35L 


259 


3 1 6627. J 17304 


D02_orf225L 


260 


3 17742. J 19061 


D02_orf439 


261 


319237.320034 


D02.off265V 


262 


320 102.320524 


D02_orfl40 


263 


320666.320995 


D02_ocfl09 


264 


321313.32101 1 


D02_oifl00 


265 


321751.322791 


D02_orf346 


266 


322953-324173 


D02_orf406 


267 


324608.324994 


D02_orfl28 


268 


325182.325532 


D02_ocfll6 


269 


323535.327166 


D02„orf543 


270 


327180.328517 


D02_Off445 


271 


328621.330603 


D02_orf660 


272 


330605.330994 


D02_orfl29 


273 


331116.331442 


D02_ocfl08 


274 


331430.332305 


D02_orf291 


275 


332405.335515 


D02_orfl036o 


276 


335519.336232 


H03_orf237 


277 


336402.336860 


H03_ocfl52 


278 


337074.338129 


H03_orO51 


279 


338333.339634 


H03_«f433 


280 


339627.340373 


H03_orf248 


281 


341011.340370 


H03_orf213 


282 


341065.342381 


H03_orf438 




342382.342432 


mptgab 


283 


343166.342459 


H0J_orf235 


284 


343695.343120 


H03_orfl9l 


285 


345526.343688 


K03_orf6l2 


286 


345554.347167 


H03_orf337 


287 


347210.347791 


H03_orfl93o 
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hypothetical protein (P27712); SPIC! 
MC442 homolog. MYCCE 
protein P30. MYCPN 
putative lipoprotein 

PTS system mannitol -specific component II A (EIlA-MTLKmilF); STRMU 
rnannitol-1 -phosphate 5-dehyrogenase (EC 1.1.1. l7XmtlD); STRMU 
PTS system manniioUpcciTic component IIA (EJIAMTLXmtlA): STACA 
putative lipoprotein 

MG441 homolog. MYCCE 
putative lipoprotein. MC439 homolog. MYCCE 
putative lipoprotein. MC440 homolog. MYCGE 
putative lipoprotein, MG439 homolog, MYCGE 
putative lipoprotein. MC439 homolog, MYCGE 
MG440 homolog. MYCGE 
putative lipoprotein. MG439 homolog. MYCGE 
putative lipoprotein. MG440 homolog. MYCGE 
putative lipoprotein. MG439 homolog, MYCGE 
MG439 homolog. MYCGE 
MG438 homolog. MYCGE 
CDP-diglyceride synthetase (cdsA): HAEIN 
ri bo some releasing factor (fit); HAEIN 



uridylate kinase (pyrH): ECOU 
elongation factor Ts (isf); SPIC1 
hypothetical protein (yfiB) homolog; SPICI 
triosephosphate isomerase (tim); ECOU 
phosphoglycerate mutase (pgm); BACSU 

PEP-dependent HPr protein kinase pbosphocyliransferise (Enzyme 0 (pttl); STRSL 

MG428 homolog. MYCGE 

MG427 homolog, MYCGE 

ribosomal protein L28 (rpL28); BACSU 

ATP-dependent RNA belicase (deaD); HAEIN 

ribosomal protein S15 (BS18); BACST 

MG423 homolog, MYCGE 

MG422 homolog. MYCGE 

excinuclease ABC subunit A (uvrA): ECOU 

DNA polymerase III subunit gamma and tau (dnaX); ECOU 

ribosomal protein LI 3 (rpL13); ECOU 

ribosomal protein S9 (rpS9); BACST 

restriction-mod ideation enzyme subunit SIB (hsdS): MYCPU 

MG4 13 homolog, MYCGE 

MG4 15 homolog. MYCGE 

MG414 homolog. MYCGE 

MG4I2 homolog. MYCGE 

phosphate transport system permease protein (pstA); ECOU • 

phosphate transport ATP-binding protein (psiB); ECOU 

phosphate transport system regulatory protein (phoU); ECOU 

peptide methionine sulfoxide reductase (pmsR). ECOU 

enolase (eno) (EC 4.11.11); PLAFA 

ATP synthase A chain (aq>B); MYCGA 

ATP synthase protein I (atpl); MYCGA 

ATP synthase C chain (atpE); MYCGA 

ATP synthase B chain (atpF); MYCGA 

ATP synthase delta chain (alpH); MYCGA 

ATP synthase alpha chain (atpA); MYCGA 

ATP synthase gamma chain (itpG); MYCGA 

ATP synthase beta chain (atpD); MYCGA 

ATP synthase epsiton chain (atpO; MYCGA 

MG397 homolog, MYCGE 

galactose-6-phosphatc isomerase subunit QscA); STRMU 



putative lipoproiein. MG395 homolog. MYCGE 

MG068 homolog. MYCGE 

putative lipoprotein, MG395 homolog. MYCGE 

MG395 homolog. MYCGE 

putative lipoprotein. MG395 homolog. MYCGE 

MG068 homolog. MYCGE 

MG067 homolog. MYCGE 

putative lipoprotein. MG068 homolog. MYCGE 

MG067 homolog. MYCGE 

MG068 homolog. MYCGE 

putative lipoprotein. MG068 homolog. MYCGE 

MG068 homolog. MYCGE 

MG395 homolog. MYCGE 



MG068 homolog, MYCGE 

serine hydroxyrnethyltraiuferase (glyA); ACTAC 

heat shock protein GroES; BACSU 

heat shock protein GroEL; BACSU 

nonspecifted aminopeptidase; MYCSA 

Uctococcin transport ATP-binding protein (lcnDR3); LACLA 

MG389 homolog. MYCGE 

MG388 homolog. MYCGE 

G TP-binding protein era homolog; STRMU 

protein P200; MYCPN 

glycerophosphoryt diester pbospbodiejterase (glpQ): STAAU 

NAOP-dependent akobol dehydrogenase (adh); THEBR 
OTP-binding protein (obg); BACSU 

probable NH(3)-dcpewIenl NAJX+) synthetase (outB); BACSU 

uridine kinase (udkh HAEIN 

arginine deiminase (arcA); PSEPU 

ArgtRNAgene (AGA); MYCPN 

MG381 homolog. MYCGE 

glucose inhibited division protein (gidfi); ECOU 

glucose Inhibited division protein (gidA); ECOU 

trginyl-tRNA synthetase (argS); BRELA 

MG377 homolog (put zinc protease), MYCGE 
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288 


347793.348107 


C12_oifl04 


289 


348107.349801 


GI2_orf564 


290 


349794.350603 


C12_orf269 


291 


350610. J5 1455 


GI2_orf28l 


292 


351442.J52605 


G12_Off387 


293 


352598.353575 


Gl2_orf325 


294 


353562.354542 


Gl2_orT326 


295 


354597.J56273 


C12_orf558 


296 


356273.357259 


GI2_orf328a 


297 


357249.358097 


G12_orf282a 


298 


360075.358081 


G12_orf664 


299 


361010.360075 


Gl2_ocf3ll 


300 


361671.J6I015 


GI2_orf2l8 


301 


361732.361995 


G12 orf87 


302 


362178.362005 


G12_orf57 


303 


362553.362185 


Gl2_ocfl22 


304 


363076.362591 


G12_Offl6I 


305 


363194.364432 


GI2_otf4l2 


306 


365341.364418 


G12_orf307 


307 


365936.365316 


G12_oif206 


308 


366364.365942 


G12_offl40b 


309 


366705.367877 


G12_orf390 


310 


367885.368733 


G12_orf282b 


311 


368909.371056 


G12_orf715 


312 


371463.371053 


G12_orfl36 


313 


371612.371941 


G12_orfl09 


314 


373019.J72465 


G12_orfl84 


315 


373074.373751 


C12_orf22i 


316 


374992.374006 


GI2_orf328b 


317 


376214.374973 


G12 orf413 


318 


376807.377313 


G12_orfl68 




376824.377060 


REPMPl 


319 


377903.378820 


G12_orf305 




378870.378945 


mptgb 


320 


379607.378975 


G12_orf210V 


321 


380098.379598 


G12_orfl66b 


322 


380141.382726 


G12_orf861 


323 


382844.383662 


GI2_orf272V 


324 


383665.38471 1 


G12_orf348 


325 


385804.3863tM 


Gl2_orfI66a 


326 


386397.390572 


Gl2_orfI39lo 


327 


390576.394448 


FW_orfl290 


328 


394610.J94972 


P04_orfl20 


329 


395489.395941 


F04_orfl50 


330 


396719.397183 


P04_Offl54 


331 


397214.397996 


FW_crf260V 


332 


398608.399984 


P02_orf458 


333 


401014..402297 


P02_orf427 


334 


402844..404373 


P02_orC509 


335 


405492..40440I 


P02_orf363V 


336 


407993..405612 


P02_orf793 


337 


40S909..409670 


P02_orfZ53 


338 


4I0I18..409738 


P02_orfl26 


339 


411833..410688 


P02_orf381 




412343..4 10580 


REPMP5 


340 


413656.412383 


P02_orf422V 




4 13701. .4 12404 


REPMP4 


341 


41469I..414101 


P02_orfl96 




414718..414417 


REPMPl 


342 


416640..415057 


P02_orf527V 




4J6770..415I61 


REPMP2/3 


343 


417279..416788 


P02_orfl63 


344 


41796I..4I7233 


P02_orf242 


345 


418272..418703 


P02_orfl43 


346 


419131.421113 


P02_orf660 


347 


421405. .421884 


P02 orf 159 


348 


421886..422542 


P02_orf2l8 


349 


42247S..423395 


PO2_orf305 


350 


4249S8..423534 


P02_orf474 


351 


425032..426O42 


P02_orf336 


352 


426558..430460 


PO2_orfl300 


353 


431O60..43O638 


P02_orfl40 


354 


432289..43I063 


PO2_orf408 


356 


432878..43382S 


P02_orf316 


355 


432936.-432493 


P02_orfl47 




434119..434385 


REPMPl 


357 


434245..434S56 


P02_offl03b 


358 


436086..43506I 


P01_crf34l 


359 


436374..436955 


P01_orfl93 


360 


436939..43945S 


P01_orf83S 


361 


439483..44O076 


P01_orfl97 


362 


440080.. 4407 8 7 


P01_crf7J5 


363 


440790..44UI9 


P01_crf2O9 


364 


44I446..442099 


P01_orf217 


365 


442572..443450 


P01_orf292 


366 


443807..446908 


P0I_orfi033 


367 


446895..447701 


P0!_orf268 


363 


447707..448588 


F0I_orf293 


369 


44J607..448768 


P01_orf33 


370 


448768..449S32 


P0I_orf354 


371 


449873..450604 


P01.orf243 




450647..45I033 


IQsaRNA 




45I297..451058 


rapBRNA 


372 


452076..45145O 


P01_orf208V 


373 


4528I3..453U8 


POLorflOI 


374 


453148..453570 


P01_orfI40 


375 


453614..4542I3 


P01_orfl99 




454252.-453959 


REPMPl 


376 


455967..454630 


H08_orf445 


377 


456734..456261 


H08_orfl57a 




456769.-4547 19 


REPMP5 


378 


45762 1..456809 


H08_orf270 




457770..456825 


REPMP4 


379 


458468.-457773 


H08_orf231 



MG376homolo|. MYCGE 
Ihreonyl-lRNA synthetase (thrSv); BACSU 
MG374 homolog. MYCGE 
MG373 homolog. MYCGE 
MG372 homolog. MYCGE 

hypothetical 28K protein (PI operon) homolog: MYCPN 
hypothetical protein (HI0176) homolog; HAEIN 
MG369 homolog. MYCGE 

fatty acid/phospholipid synthesis protein (plsX); ECOU 

ribonuclease III (roc): ECOU 

MG366 homolot. MYCGE 

methionyl-tRNA forayi transferase (fmi); ECOU 

MG364 homoloj. MYCGE .. . 

ribosomal protein S20 (rpsT); ECOU 

ribosomal protein L32 (rpl32): HAEIN 

ribosomal protein L7/LI2 f A* type) (rpL7/LI2); MICLU 

ribosomal protein L10 (rpLIO); THEMA 

UV protection protein (roucB); ECOU 

Holliday junction ONA helicase (ruvB); HAEIN 

Holliday junction DNA helicaie (ntvA); ECOU 

acetate kinase (ackA); BACSU 
UcA protein (UcA) homolog: HAEIN 

ATP-dependent protease binding subunit (dpB) homoloj; HAEIN 

MG354 homolot. MYCGE 

MG353 homolog. MYCGE 

inorganic pyrophosphatase (ppa); THEAC 

MG350 homolog, M.YCGE 
MG349 homolog, MYCGE 

repetitive DNA sequence REPMPl 

putative lipoprotein. MG348 homolog. MYCGE 

His-tRNA(CAC) gene; MYCPN 

hypothetical protein (HI0340) homolog; HAEIN 

hypothetical protein (ygl3) homolog; BACST 

isoleucine-tRNA ligase (ileS); STAAU 

triacylglycerol lipase (lip) 3; MYCMY 

MG343 homolog. MYCGE 

MG342 homolog. MYCGE 

RNA polymerase beta subunit (rpoB); BACSU 

DNA-directed RNA polymerase bett' chain CrpoC): THEMA 



MG288 homolog, MYCGE 
MG288 homolog. MYCGE 
MG096 homolog. MYCGE 
MG288 homolog. MYCGE 
MG288 homolog. MYCGE - 

type I restriction enzyme ecokl specificity protein (hsdS) homolog; HAEIN 
putative lipoprotein. MG260 homolog. MYCGE 



hypothetical 130K protein homolog (orftt. PI operon); MYCPN 
repetitive DNA sequence REPMP5 
ADP1.MYCPN adhesin PI precursor homolog; MYCPN 
repetitive DNA sequence REPMP4 

repetitive DNA sequence REPMPl 

ADP1_MYCPN adhesin PI precursor homolog; MYCPN 

repetitive DNA sequence REPMP2/3 

L-ribulose-5-phosphaie 4-epimeroe (araD); ECOU 

hypothetical protein (yj'fS) homolog; ECOU 

hypothetical phosphotransferase protein (yjfU) homolog: ECOU 

hypothetical protein (yjfV) homolog; ECOU 

hypothetical protein (yjfW) homolog; ECOU 

recombination protein (recA); STAAU 
putative lipoprotein, MG338 homolog. MYCGE 
MG337 homolog. MYCGE 
nitrogen fixation protein (nifS); HAEIN 
MG338 homolog. MYCGE 

repetitive DNA sequence REPMPl 

hypothetical protein (yibD) homolog; ECOU 

hypothetical protein (yihA) (era like) homolog: ECOU 

valyt-tRNA synthetase (valS); BACST 

hypothetical protein (HI 1366) homolog; HAEIN 

hypothetical protein (HI0315) homolog; HAEIN 

MG331 homolog. MYCGE 

cytidyUte kinase (cmk); BACSU 

hypothetical protein (HI 01 36) (era like) homolog; HAEIN 

MG328 homolog. MYCGE 

triacylglycerol lipase (lip) 2; MYCMY 

homolog (degV) protein; BACSU 

ribosomal protein 133 (rpL33); BACST 

X-Pro dipeptidase (pepX); LACDE 

lOsaRNA; MYCGE 
RNasePRNA; MYCGE 

ADPI_MYCPN adhesin PI precursor homolog; MYCPN 
putative lipoprotein 



repetitive DNA sequence REPMPl 

hypothetical I30K protein homolog (orf6, PI operon); MYCPN 

repetitive DNA sequence REPMP5 

ADPI.MYCPN adhesin PI precursor homolog; MYCPN 

repetitive DNA sequence REPMP4 

hypothetical protein (yzaQ homolog: BACSU 
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381 
382 
383 
384 

385 

336 

387 

388 

389 

390 

391 

392 

393 

394 

396 

395 

397 

398 

399 

4C0 

40! 

402 

403 

404 

405 

406 

407 

408 

409 

410 

411 

412 

413 

414 

415 

416 

417 

418 

419 

420 

421 

422 

423 

424 

425 

426 



431 
432 

433 
434 
435 
436 
437 
438 

439 
440 
441 
442 
443 
444 
445 
446 
447 
448 
449 
450 
451 

452 
453 
454 
455 
456 
457 
458 
459 
460 

461 
462 
463 
464 
465 
466 



467 
468 
469 



4585O3..460200 
460I65..46O885 

460960..462735 

462656..463I29 

46307 1. .464060 

464443..467460 

467624..467717 

4677S6..468649 

468738.-4693 19 

46934O..470164 

470178..472196 

472236.-473345 

473224..474168 

474180..475526 

475643.-476434 

476498.-479554 

479577..480194 

4S1U9..485096 

481I24..480255 

485103..486332 

4863I7..486769 

48739O..487082 

487860..490040 

490I96..490909 

49O965..492002 

492220..493938 

494247..497981 

49799 L.499 178 

499234.301021 

501179.301991 

501886..5O3034 

503024.303977 

504008. .505021 

505024.306253 

506 291. .50725 3 

508131.307259 

508316.311264 

511270.312316 

512297.312605 

512605.312994 

512995.314107 

5 14238. J 15665 

5 15658. 3 16383 

516435.319137 

521188.319560 

52I9I5.32U8I 

523050.321908 

524782.323301 

524892. 3253 11 

525343..523309 

525388.-526224 

526357.325404 

526818.327576 

528050.327890 

528164.327718 

528191.328045 

530128.328527 

53020 1.328684 

532483.33O20I 

532711.335350 

535464.335390 

535709.335455 

536337.335744 

537384.336344 

537733.337365 

539329.337878 

539611.340093 

540123.340573 

540861. .54 2609 

542671.343534 

543534.344190 

546388.344187 

546644.349307 

549474.349875 

549943.351382 

551403.352479 

552501.353484 

553803.35501! 

555012.356385 

556412.357431 

557803.358879 

558904.358982 

559027.359716 

559751.360095 

560096.362477 

562480.363328 

563860.363258 

564732.363854 

565711.364878 

566586.365711 

569208.366590 

569524.369598 

569863.373285 

573664.374053 

574399.375088 

576117.376731 

578517-576742 

578671.379306 

579725.378587 

581534.380008 

581562.379349 

582203.382964 

583638.383096 



H08_orf563 
mptgv 

H08_orf59l 

H08_orfI57b 

H0S_orf329V 

HO8_orfl005 

mptgs 

H08_orf287 

H08_orfl93 

H08_orf274 

H08_orf672 

H08_orO69 

H08_orf3l4 

H08_orf448 

H08_orf263 

H08_orfl0I8 

H0S_orf205 

H08_orfl325 

H08_Off289 

H08_orf4O9 

H08_offl50 

H08_offlO2 

H08_orf726 

H08_orf237 

H08_orf345 

H08_orf572o 

A05_orfl244 

A05_orf395 

A05_orf595 

A05_orf270L 

A05_orf382 

A05_orfJ17 

A05_orf337 

A05_orf409 

A05_orf320 

A05_orf290 

A05_ocf982 

A05_orf348 

A05_Offl02 

A05_orfl29 

A05_orf370 

A05_Off475 

A05_orf241a 

A05_orf900 

A05_orf542 

A05_orf244 

A05_orf380V 

A05_orf493 

A05_orfl39 

REPMP5 

A05_orf278 

REPMP4 

A05_orf252 

REPMPl 

FU_orfU8o 

REPMP1 

FlI_orf533L 

REPMP2/3 

FU_crf760 

Fll_orf879 

mptgwa 

Fll_orf54 

Fl l_orfI97 

Fll_orf346 

Fll_oril22a 

FII_orf483 

Fll_orfl60 

mptrna 

Fll_orf582 

FII_orf287 

Fll_orf218 

Fll_orf733 

Fll_orf887 

Fll_orfl33 

FU_orf479 

FlLorf358a 

FII_or027 

Fl l_orf402 

Fll_orf457 

FH_orf339 

Fll_Off358b 

43s RNA 

Fl l_orf229 

FlLotfM4 

Fll_off793o 

AI9_orf282 

A19_ocf200 

A19jorf292 

A19_orf277 

A!9_orf29I 

A19_orf872 

mptga 

A!9_orfll40 

A19_orfl29 

A19_orf229V 

A19_«f204 

AJ9_orf59l 

AI9_orf21l 

REPMP4 

REPMP2/3 

AI9.orf737V 

H9Lwf253 

H9l„orfIS0 



Na(+) translocating ATPast subunit J (ntpj): ENTHR 

,™ '^^ a A(CAA) ' "^ ^(AC 0 ). VaI.lRNA(GTA). Thr-iRNA(ACA). Lys-tRNA(AAG). Uu- 

(KNA(CTA) genes; MYCPN 

MG321 homolog. MYCGE 

MG32I homolog. MYCCE 

adhesin PI (troop 2) homolog; MYCPN 

putative lipoprotein. MG32I homolog. MYCGE 

Ser-tRNAfTCC). Ser-iRNA(TCG) genes; MYCPN 

(cytochrome C oxidase polypeptide 1 (ctaDh BACSU) 

MG319 homolog, MYCGE 

30K adhtsin-rdated protein; MYCPN 

cytadherence accessory protein (hmw3); MYCPN 

(competence locus E (comE3); BACSU) 

MG3 1 5 homolog. MYCGE 

MG314 homolog. MYCGE 

MG3 1 3 homolog, MYCGE ... - 

cytadherence accessory protein (hmwl); MYCPN 

ribosomal protein S4 (rpS4); BACSU 

putative lipoprotein. MG309 bomolog. MYCGE 

triacylglycerol lipase flip) 3: MycopUsma sp 

ATP-dependeni RNA helicase (deaD); ECOU 

putative Upoprotein. MG307 homolog. MYCGE 

MG307 homolog. MYCGE 

putative lipoprotein. MG307 bomolog. MYCGE 

MG307 homolog. MYCGE 

MG307 homolog. MYCGE 

putative lipoprotein. MG307 bomolog. MYCGE 

MG306 bomolog. MYCGE 

beat shock protein DnaK, ERYRH 

abc transport ATP-binding protein (cbiO), SALTY V 
abc transport A TP-bin ding protein (artP); ECOU 
MG302 homolog. MYCGE 

glycerladchydc*3-phosphate dehydrogenase(gip). CLOPA 

phosphoglycerate kinase (pgk); THEMA 

phosphotransacetylase (pU); BACSU 

hypothetical protein (yidA) bomolog; ECOU 

PI 15 protein homolog (SGC3); MYCHR 

ceU division protein (ftsY); ECOU 

hypothetical 13.2 KO protein bomolog (ylxM); BACSU 

MG296 homolog, MYCGE 

hypothetical protein (HI0174); HAEIN 

MG294 homolog (put. permease), MYCGE 

glycerophosphoryl dicster phosphodiesterase (glpQ); BACSU 

alanyl-tRNA synthetase (alaS); ECOU 

transport system permease protein P69; MYCHR 

ATP-binding protein P29; MYCHR 

high affinity transport syswn protein P37; MYCHR 

hypothetical I30K protein bomolog (orf6, PI operon); MYCPN 

repetitive DNA sequence REPMP5 

ADP1_MYCPN adbesin PI precursor homolog: MYCPN 

repetitive DNA sequence REPMP4 

putative lipoprotein, MG440 bomolog. MYCGE 

repetitive DNA sequence REPMP1 

repetitive DNA sequence REPMP1 

ADP1_MYCPN adhesin PI precursor homolog: MYCPN 

repetitive DNA sequence REPMP2/3 

putative lipoprotein. MG260 bomolog, MYCGE 

Trp-tRNA (TGA) gene; MYCPN 

(acyl carrier protein; STRGA) 

MG286 bomolog, MYCGE 

MG285 bomolog. MYCGE 

MG284 bomolog, MYCGE 

putative prolyl-tRNA synthetase (proS); YEAST 

transcription elongation factor (greA); RICPR 

Tyr-tRNA (TAC), Gtu-tRNA (CAA). Lys-lRNA (AAA). Leu-lRNA (TTA). Gly-lRNA (GCA) genes* MYCPN 

MG281 bomolog. MYCGE 

MG280 bomolog. MYCGE 

MG279 bomolog. MYCGE 

stringent response protein (spoT); ECOU 

MG277 homolog, MYCGE 

adenine pbospbccibosyltransferase (apt); HAEIN 

NADH oxidase (nox); ENTFA 

pyruvate dehydrogenase El -alpha subunit (pdhA); ACHLA 
pyruvate dehydrogenase El-beta subunit (pdhB); ACHLA 
dihydrolipoamide acetyltransf erase component (E2) (pdbC); ACHLA 
dihydrolipoamide dehydrogenase (pdhD); BACST 
tipoate protein ligase (IptA); ECOU 
MG269 homolog. MYCGE 
4.5S RNA; MYCPN 

hypothetical protein (yaaF) bomolog; BACSU 

MG267 homolog, MYCGE 

leucyl-tRNA synthetase (lcuS); BACSU 

hypothetical protein (yidA) homolog: ECOU 

hypothetical protein (HI0890) bomolog; HAEIN 

hypothetical proteio (yidA) bomolog; ECOU 

fc«namidopyrimidine-DNA gtycosylase (fpg); BACFI 

DNA polymerase I (polA, 5'-3* exonuclease) bomolog: STRPN 

DNA polymerise III alpha subunit (dnaE); HAEIN 

Arg-tRNA gene (CGA); MYCPN 



repetitive DNA sequence REPMP4 
repetitive DNA sequence REPMP2/3 
ADPI.MYCPN adhesin PI precursor homotog;MYCPN 
putative Upoprotein 
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583663.J83392 


REPMP1 


470 


585295.JS4327 


H9l_orf322 


471 


5S6044..5 85226 


H9l_orf272 




586110.584114 


RJEPMP5 


472 


586934.586 128 


H9l_orf268 


473 


5893 11. .587278 


H91_orf677 


474 


589658.J89350 


H9l_orfl02 


475 


591151 .589790 


H91_orf45J 


476 


592230. J91 151 


H9l_0rO59V 


477 


592524. .592231 


H91_orf97 


478 


593345.-592569 


K91_orf258 




593426.593353 


mptgx 


479 


595179..593575 


H91_orf534 




595211. .595283 




480 


595347.^97323 


H91_orf658 


481 


597304..59S617 


H91_ocf437 


482 


598620.J99348 


H91_orf242a 


483 


59937O..6O0719 


H91_orf449 


484 


600703.602565 


H91_orf620 


483 


602618..604 U7 


H9l_orf499 


486 


604I01..604742 


H91_orf2l3 


487 


6O4748..605467 


H91_orf239 


488 


606304..605459 


H9i_orfl8l 


489 


606788..606294 


H91_orfl64 


490 


608873..607743 


H91_or076 


491 


609427-609080 


H9l_orf]15 


492 


610177„609557 


H9t_orf206 


493 


6I1772..6I1122 


H9l_«f216 


494 


6I2987..6U99S 


H9l_orf330 


493 


6I4997..613366 


H91_crf543 


496 


6I72S5..615138 


H9l_orf715 


497 


618937..617348 


H91_orf529 


498 


6196 15..6 18941 


H91_orf224 


499 


62 1 5 13.6196 15 


Fl0_orf632o 


500 


62338 L.62I5 16 


Fl0_orf62l 


501 


623625..6245O0 


F10_od291 


502 


626726..624501 


F10_orf74l 


503 


627693.626713 


F10_orf326 


504 


62994S..627696 


F10_orf750 


505 


632530..630143 


F10_orf793 


506 


633935..632601 


F10_orf444 


507 


634844..633960 


F10_orf294 


508 


635310..634834 


F10_orfl58 


509 


636124..635264 


F10_orf286 


510 


636431.. 6361 17 


F10_orfI04 


511 


636726.-636424 


FlO.orflOOa 


512 


63702I..636719 


FlOjorflOOb 


513 


639333..637168 


F10_crf721 


514 


639818..639357 


F10 orfl53 


515 


640840..63982I 


F10_orf339 


516 


641329..640847 


F10_orfl60 


517 


642317..64133I 


FIO crf328 


518 


644200..642689 


F10_orf503 


519 


645650..644175 


F10_orf491 


520 


646S35..645693 


F10_orf380 


521 


6481O0..64684I 


FIO orf419 


522 


649O29..648103 


F10_orf308 


523 


649444..649019 


F10_orfI41b 




649775.-649699 


mptgac 


524 


649845..650U7 


F10jorf90 


525 


650856..65O2O0 


FIO_orf2I8 


526 


651919..650846 


F10_orG57 


527 


657390..651934 


FlOjorflSIS 


528 


658627..657410 


F10_crf405 


529 


660458..658761 


FI0_orf565 


530 


661390„660461 


FI0_orf309 


531 


6622 14..66 1393 


HI0_orf273o 


532 


663058..662462 


HIO_orfl98 


533 


663675.-662959 


HI0_orf238 




664617..663872 


mptjc 


534 


666 181. .664655 


HI0_orf508 


535 


667173..666187 


HlO.orOtt 


536 


667819..667193 


H10_orf208 


537 


669323. .667803 


H10_orf506 


538 


670124.669324 


H10_orf266 


539 


670471, .670 112 


H10_orfll9 


540 


670923..670474 


H10jorfU9 


541 


671792..671I30 


H10_orf22QL 


542 


67246I..671841 


H10_orf206 


543 


672500..673054 


HI0_orfl84 


544 


673054..673983 


HI0lorf309 


545 


673967„674557 


Hl0_orfl96 


546 


674987..674550 


H10 orfl45L 


547 


675689..675126 


H10_orfl87V 


548 


678142-675779 


A65Jorf787o 


549 


679094..678738 


A65lorfll8 




680988. .679736 


REPMP2/3 


550 


681222..679825 


A65_ocf465V 


551 


682245..681325 


A65_orf306 


552 


6850S8..682704 


A65_orf794 




6S6360..686126 


REPMPI 


553 


686379..686032 


A«_orfIl5 


554 


688O9O..687590 


A65_orfl66 


555 


6S9578..688445 


A65_orf377 


556 


691498..689789 


A65_orf369 


557 


693374..69I629 


A65_prf3*l 


558 


694573..693374 


A65_orf399V 


559 


696002..694533 


A65_Off4S9 


560 


696047..6969O4 


A65jDrt2«5 


561 


697178.696876 


AWjorflOO 


562 


697200..698000 


A65_orf266 


563 


697969..698403 


A65_orfl44 


564 


701 122..700367 


A65_orf25la 



repetitive DNA sequence REPMPI 

hypothetical 130K protein homolog (orf6, PI operon); MYCPN 
hypothetical 130K protein homolog (orf6, PI operon); MYCPN 
repetitive DNA sequence REPMP5 

type I restriction enzyme ecoU specificity protein (hsdS) homolog; HAEIN 

MG260 homolog. MYCGE 

putative lipoprotein. MC260 homolot. MYCGE 

possible protoporphyrinogen oxidase (hemK); ECOU 

peptide chain release factor 1 (RF1; prfA);BACSU 

ribosonul protein L31 (rpUl); ECOU 

MG256 homolog, MYCGE 

Trp-tRNACTGG) gene; MYCPN 

MG255 homolog, MYCGE 

Cly-iRNA{GCO gene; MYCPN 

DNA tigase(lig): ECOU 

cysteinyl-tRNA jynthetase {cysSY. BACSU 

hypothetica) protein (yacO) (rRNA methylase) homolog: BACSU 

glycyl-tRNA synthetase (grsl); YEAST 

DNA primase (dnaG); BACSU 

RNA polymerase sigma-A factor (ngA); BACSU 

MG248 homolog; MYCGE 

hypothetical protein (ygiH) homolog; ECOU 

MG246 homolog. MYCGE 

5-formyl teirahydrofolate cydo-ligise (HI0858) bomolog: HAEIN 
Type 1 restriction enzyme (hsdR) bomolog: ECOU 

Type I restriction enzyme (hsdR) bomolog: ECOU 

type I restriction enzyme ecotl specificity protein (tudS) homolog; HAEIN 

type I restriction enzyme (hsdM); ECOU 

DNA belicase D (muiBl); HAEIN 

DNA belicase (pcrA) homolog; STAAU 

MG243 bomolog, MYCGE 

MG242 bomolog, MYCGE 

MG241 bomolog. MYCGE 

MG240 bomolog. MYCGE 

protein (bcrA) bomolog; BACU 

putative ABC transport permease 

ATP-dependent protease (loo); BACSU 

trigger factor (tig); HAEIN 

MG237 bomolog, MYCGE 

MG236 homolog, MYCGE 

endonuclease IV (nfo); ECOU 

ribosomal protein L27 (rpL27); BACSU 

bypothetical protein (ysxB) homolog: BACSU 

ribosomal protein L21 (rpL21); BACSU 

ribonucleosideKJiphosphate reductase (nrdE); SALTY 

MG230 homotog. MYCGE 

ribonucleotide reductase 2 (nrdF); SALTY 

dihydrorolate reductase (EC U.1 JXdhfr); LACLA 

thymidytate synthase (thy A); STAAU 

general amino acid permease GAPI homolog; YEAST 

hypothetical protein (gi: 710640) bomolog (put amino acid permease); CLOPE 

cell division protein (ftsZ); BACSU 

MG223 homolog. MYCGE 

hypothetical protein (yabQ bomolog; ECOU 

hypothetical protein (yabB) bomolog; ECOU 

Arg-lRNA gene (CGQ; MYCPN 

MG220 bomolog. MYCGE 



cytadherence accessory protein (hmw2); MYCPN 
protein P65; MYCPN 

carbamate kinase (EC 2.7.2-2) (arcQ; PSEAE 
ornithine carbamoyl transferase (otcl): ECOU 
arginine deiminase (arcA); MYCCA 
argtnine deiminase (arcA); MYCCA 

Cys-tRNACTCC). Pro-tRNA(CCA). Met-lRNA(ATG). De-tRNA(ATG), Ser-lRNAfTCA), fMet-lRNA(ATG). Asp- 

tRNA(G AC) and Phe-tRNA(TTC) genes; MYCPN 

pyruvate kinase (pyk); LACLA 

6-pbospbofnxnofcinase (pfk); ECOU 

hypothetical protein (P35 1 55) homolog: B ACSU 

dihyrofolaie reductase (dyr) bomolog protein; ENTFC 

1-acyl-sn-glyceroIO-phosphate acyltransf erase (pliB); YEAST 

MG211 bomolog. MYCGE 



prolipoprotein signal peptidase (Up); STACA 
hypothetical protein (yceQ homolog; ECOU 
MG208 homolog, MYCGE 

type 1 restriction enzyme eookl specificity protein (hsdS) bomolog: HAEIN 

HsdS I B protein bomolog; MYCPU 

putative lipoprotein. MG260 bomolog, MYCGE 

repetitive DNA sequence REPMP2/3 
adhesin PI (group 2) bomolog; MYCPN 
protein (prrB) bomolog. ECOU 
putative lipoprotein. MG260 bomolog, MYCGE 
repetitive DNA sequence REPMPI 

MG260 bomolog. MYCGE 

MG260 homolog. MYCGE 

MG139 bomolog. MYCGE 

OTP-binding membrane protein CepA); HAEIN 

YefE protein bomolog; ECOU 

lysyl-tRNA tyntbetase (ly»S); BACSU 

MG 135 bomolog. MYCGE 

hypothetical protein (yaaK) bomolog; BACSU 

MG 133 bomolog, MYCGE 

hypothetical protein (hid) bomolog; YEAST 

putative lipoprotein. MG440 bomolog. MYCGE 




^bU^ Continued 

ft' 
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565 


703155-701674 


A65_orf493 


hypothetical protein (yul) bomoloj: MYCMY 


566 


703498..703I45 


A65_orfll7 


MGL29 homolog. MYCGE 


567 


704277..703498 


A65_orf259 


hypothetical protein (H10072) homolog; HAEIN 


568 


7047I4..704277 


A65_orfI45 


hypothetical protein (ygl I) twrnolog; STRVR 


569 


704771..7058II 


A65_orf346 


trypiophanyl-iRNA synthetase (trpS); HAEIN 


570 


706664..705819 


A65_orf281 


hypotheiicaJ protein (gi: 973220) homolog; ECOU 


571 


706984-706676 


A65_orfl02 


thioredoxin (in); YEAST 


572 


708477..707050 


A65_orf475 


MGI23 homolog. MYCCE 


573 


7106O2..708467 


A65_orf711 


DNA topoisomerase I (lopA); BACSU 


574 


711574..7I0639 


A65_Off3H 


high affinity ribose transport protein {rbsQ; HAEIN 


575 


713127-71 1574 


A65_«f5l7 


MCI 20 homolog, MYCGE 


576 


714862..7I3144 


A65_crff572 


hypothetical ABC iranipoitcr (yjcW) homotof ; ECOLI' 


577 


715893..7I4877 


A65_orf338 


UDP-glucose 4-epimerase (galE); STRTR 


578 


716545-715874 


A65.orf223 


MGI 17 homolog. MYCGE 


579 


717293-716538 


A65_orfZ51b 


MGI 16 homolog. MYCCE 


580 


718497..717814 


A65_orf227 


phosphatid ^glycerophosphate synthase (pgsA): HAEIN 


581 


7 19821. .7 18454 


K04_orf455o 


asparaginyl-lRNA synthetase (asnS); ECOU 


582 


720475..7 19828 


K04_ocf215t 


D-ribuiose -5- phosphate 3 eptmerase (cfxE); ALCEU 


583 


721745-720453 


K04_orf430 


phosphoglucose isomerase B (pgiB); BACST 


584 


722603-721767 


K04_orf278L 


hypothetical protein (yieQ) homolog; ECOU 


585 


723759..722590 


K04_orf389 


probable protein serincArutonine kinase (YKT3); CAEEL 


586 


724529-723750 


K04_orf259 


protein phoshatase 2C bomolog (ptel); YEAST 


588 


725070..725720 


K04_orf216 


polypeptide deformytase (def);±IAEIN 


587 


725248-724529 


K04_orf239 


5-guanylate kinase (gmk); HAEIN 


589 


726297-725689 


K04„Off202 


MC105 homolog. MYCGE 


590 


728477-726297 


K04_orf726 


virulence associated protein homolog (vacB); HAEIN 


591 


729593-72875 1 


K04_ort280 


MG 103 homolog. MYCCE 


592 


730530..729583 


K04_orf315 


thioredoiin reductase (traB); EUBAC 


593 


731191-730523 


K04_orf222 


MGI0I homolog. MYCGE 


594 


732602-731166 


G07_orf478o 


protein (petl 12) homolog; YEAST 


595 


734028..732592 


C07_orf478V 


amidase homolog (S47454); YEAST 


596 


735470-734031 


G07_orf479 


MG098 homolog. MYCCE 


597 


736390-735668 


C07_orf240 


uracil DNA glycosylase (ung); ECOU 


598 


737668-736415 


G07_orf417 


MG288 homolog. MYCGE 


599 


739760..738396 


G07_orf454 


putative lipoprotein, MG095 homolog, MYCGE 


600 


741185..739764 


C07_orf473 


rcplicative DNA helicase (dnaQ; BACSU 


601 


74162I..74U72 


G07_orfl49 


ribosomal protein L9 (rpL9); BACST 


602 


741938-741624 


G07_orfl04b 


ribosomal protein S18 (rpSIS); ECOU 


603 


742428-741928 


G07_orfl66 


single -stranded DNA binding protein (ssb); HAEIN 


604 


743075..742428 


G07_orf215 


ribosomal protein S6 (rpS6); ECOU 


605 


745I98..743132 


G07_orf688 


elongation factor G (fus); THEAQ 


606 


745688-745221 


G07_orfl55 


ribosomal protein S7 (rpS7); BACST 


607 


746161-745742 


G07_ocfl39 


ribosomal protein S12 (rpS12); BACST 


608 


747359-746190 


C07_ori389b 


prolipoprotein diacylglyceryl transferase (Igt); ECOU 


609 


748287-747349 


G07_orf312 


MG085 homolog. MYCGE 


610 


749157-748288 


G07_orf289 


hypothetical protein (yacA) bomolog; BACSU 


611 


749716-749150 


G07_ocfl88 


peptidyl tRNA hydrolase homolog (pth); HAEIN 


612 


750396-749716 


C07_orf226 


ribosomal protein LI (rpLl); BACST 


613 


750S09-750396 


G07_orfl37 


ribosomal protein LI 1 (RPL1 1); THEMA 


614 


753420..750865 


G07_orf851 


oligopeptide transport A TP-binding protein (oppF); BACSU 


615 


754654-753383 


G07_orf423 


oligopeptide transport A TP-binding protein (oppD); BACSU 


616 


755786-754656 


G07_orf376 


oligopepiide'iransport system permease protein (amiD); STRPN 


617 


756948-755779 


G07_orf389a 


oligopeptide transport system permease protein (oppB); BACSU 


618 


757224-757640 


C07_orfl38 


MG076 homolog. MYCGE 


619 


760729-757637 


G07_orfl030 


protein P10O; MYCPN 


620 


761241-760834 


G07_orfl35 


MG074 homolog. MYCGE 


621 


763217-761244 


G07_orf&57 


cictnuclease ABC tubunit B (uvrB); ECOU 


622 


7656I8..763192 


O07_orf808 


preprotein trarulocase (secA); BACSU 


623 


768223-765605 


G07_orf872V 


MG(2+) transport ATPase, P-typ 1 (mgtA); ECOU 


624 


769(00-768216 


C07_orf294 


ribosomal protein S2 (rpS2); SP1PL 


625 


772532-769710 


GT9_orf940o 


PTS system, glucose-specific IIABC component (EUABCGLC); BACSU 


626 


772584-772925 


GT9_orfll3 




627 


774296-772980 


CT9_ocf438V 


ADP1 .MYCPN adhesin PI precursor bomotog: MYCPN 




774345-773095 


REPMP4 


repetitive DNA sequence REPMP4 


628 


775203-774757 


GT9_orfl4g 


MG260 homolog. MYCGE 




775230-774929 


REPMP1 


repetitive DNA sequence REPMP1 


629 


775949-775566 


CT9_orfl27 


ADP 1 .MYCPN adhesin PI precursor homolog: MYCPN 


630 


776809-775868 


CT9_orf313 


ADP l.MYCPN adhesin PI precursor homolog; MYCPN 




777250..775724 


REPMP2/3 


repetitive DNA sequence REPMP2/3 


631 


778005-777289 


GT9_orf238 


type I restriction enzyme ecokl specificity protein (hsdS) bomolog; HAEIN 


632 


780875-778479 


CT9_ocf798 


putative bpoprotein. MG260 bomolog. MYCGE 


633 


783441-781159 


GT9_orf760 


putative lipoprotein. MG185 bomolog, MYCGE 


634 


784494-783535 


CT9_orO!9V 


adenme-specific methyltransTerase EcoRI (mtel); ECOU 


635 


786329-784494 


GT9_orf6ll 


oligoendopeptidase F (pepF); LACLA 


636 


787053-786322 


CT9_ort243V 


pseudouridylate synthase I (hisT); ECOU 


637 


788350-787046 


CT9.orf434 


MGI SI bomolog. MYCGE 


638 


789254-788343 


GT9_ocf303 


histidine transport ATP-binding protein OusP): ECOU 


639 


790066-789242 


GT9_orf274 


sulfate transport ATP-binding protein (cysA); SYNP 


640 


790424..790050 


GT9_orfI24a 


ribosomal protein L17 (roL17); BACSU 


641 


791410-790427 


GT9_or027 


RN A polymerase alpha core tubunit (rpoA); BACSU 


642 


791781..791416 


CT9„orfl21 


ribosomal protein SI 1 (rpSII); BACST 


643 


792 155..791781 


GT9_orfl24b 


ribosomal protein SI 3 (rpS13); BACSU 


644 


. 792268-792155 


CT9_ocfJ7 


ribosorral protein L36 (rpL36); CHLTR 


645 


792515-792279 


CT9_orf78 


initiation factor 1 (inf A); BACSU 


646 


793261-792515 


GT9_crf248 


methionine amino peptidase (map); BACSU 


647 


793908-793261 


CT9_orf215 


adenylate kinase (adk); BACST 


648 


795335-793902 


CT9_crf477 


preprotein translocase tubunit (secY); MYCCA 


649 


795790-795335 


GT9_orfl51 


ribosomal protein LIS (rpLt5>. MYCCA 


650 


796453-795794 


CT9_orf219 


ribosomal protein S3 (rpS5); BACSU 


651 


796807-796457 


GT9_ocfll6b 


ribosomal protein LIS (rpL18); BACST 


652 


797362-796808 


GT9.orfl84 


ribosomal protein L6 (rpL6); MYCCA 


653 


797797-797369 


CT9_ocfl42 


ribosomal protein S8 (rpSS); MYCCA 


654 


797976..797791 


CT9.ocf61 


ribosomal protein S14 (rpSU); MYCCA 


655 


798520-797978 


GT9_orfl80b 


ribosomal protein L5 (rpLS); HAEIN 


656 


798858-798523 


GT9_orf1lU 


ribosomal protein L24 (rpL24); BACST 


657 


799226-798858 


CT9_orfI22 


ribosomal protein LI4 (rpLM); BACST 


658 


799487-799230 


GT9_orf85 


ribosomal protein S17 (rpSl7); MYCCA 


659 


799822-799487 


GT9_orflllb 


ribosomal protein L29 (rpL29); THEMA 


660 


800241..799822 


VXpSPT7_offl39o 


ribosomal protein LI6 (rpL16); MYCCA 


661 


801062..800241 


VXpSPT7_orf273 


ribosomal protein S3 (rpS3); MYCCA 


662 


801 61 8-801064 


VXpSPT7_prfl84 


ribosomal protein L22 (rpL22); HAEIN 


663 


8O18OS-801545 


VXpSPT7_ori87 


ribosomal protein S 19 (rpSWfcMYCBO 


664 


802671-801808 


VXpSPT7_orf2S7a 


ribosomal protein L2 (rpUk MYCCA 


665 


8O3384-802671 


VXpSPT7_e>ff237 


ribosomal protein L23 (rpL23); THEMA 
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666 
667 
668 
669 
670 
671 
671 

m 

674 
675 
676 
677 



SO4O25..803387 
80488S..SO4025 
M5228..S04902 
805660..S05322 
806869-805907 
808328.^06991 
809615-808482 
810876..8096O2 
8II7I1-8109O2 
812932-111724 
813298-812948 
815154-813301 



VXpSPT7_orf212 ribosomal protein LA (rpU): MYCCA 

VXpSPTLorf7S7b ribosonul protein L3 (rpU); MYCCA 

VXpSPT7_orfl08 ribosonul protein S!0 (rpSlO);THEMA 
VXpSmIorfll2 

VXpSPT7 orO20 puuiive lipoprotein. MC149 bomoloc. MYCCE 

VXpSPT7~orf445 MC148 homolot. MYCCE 

VXpSPT7"of077 MG 147 homolof , MYCGE 

VXpSPT7_orf424 hemolyiin ChlyO homolof protein; HAEIN 

VXpSPT7~orf269 hypotheiical protein (yiaQ homoloj; PSEFL 

VXpSPnIorf402 MCt44homolog. MYCCE 

VXpSPT7_orf 1 1 6 ribosomc bindinj factor A homolof (rbf A); ECOU 

VXpSPT7lorf6l7 protein synthesis initinion factor 2 (infB); BACST 



noteworthy: the lack of the ribosomal protein S 1, of the peptide 
chain release factor 2 (RF2) and of the glutaminyl-tRNA 
synthetase. So far, quite a number of Gram-positive bacteria 
including Bacillus or Lactobacillus species also lack the SI 
protein and the glutaminyl-tRNA synthetase (46). 

One of the functions of the S 1 protein is to bind the mRNA to 
the 30S small ribosomal subunit. Therefore, it was argued that 
ribosomal binding sites in front of many genes (47) of B.subtilis 
compensate for the missing SI protein. The Shine-Dalgarno 
sequences are so well conserved, that they could be used routinely 
as a good indicator for proposing ORFs in the B.subtilis genome 
sequencing projects, but this does not apply to M.pneumoniae. 
The Shine-Dalgarno sequence is in many instances not well 
conserved or missing altogether, even in genes for which we 
know the translational initiation sites from independent studies. 

Of the 20 standard tRNA-synthetases, the glutaminyl-tRNA 
synthetase is the only one not detected in M.pneumoniae. Studies 
on tRNA synthetases in Gram-positive bacteria have indicated 
that this enzyme is dispensable. Bacillus subtilis solves this 
problem by charging the tRNA Gln first with glutamate which is 
subsequently converted to glutamine by an amido transferase. 
The glutamyl tRNA synthetase aminoacylates both tRNA Glu and 
tRNA Gln . The corresponding amido transferase has not yet been 
identified in M.pneumoniae, therefore it is still an open question 
as to how glutamine is bound to its tRNA. 

Finally, the modified codon usage by M.pneumoniae, reading 
UGA as tryptophan instead of a stop codon, requires the absence 
of the peptide chain release factor 2 (RF2) and the presence of the 
release factor 1 (RF1). The latter recognizes the stop codons U AG 
and UAA and RF2 the stop codons UGA and UAA. Since the 
UGA codon is frequently located within a gene it is essential to 

exclude RF2 to prevent the premature termination of proteins. 

Surface structure, cytadherence-associated proteins and 
cell division 

This category comprises the adhesins and the cytadherence 
associated proteins, including the components of the cytoskeleton- 
like structure, the function of which is probably to stabilize and 
maintain the shape of the wall-less mycoplasma, to direct proteins 
to certain regions in the membrane and to keep them in these 
positions (2). Adherence to the receptors) of the host cell depends 
on the tip structure. The correct assembly of the adhesin PI 
(E07_orf 1627) and the 30 kDa adhesin-related protein on the tip 
structure (H08_orf274) is necessary for attachment. The tip structure 
is an interesting example for bacterial cellular asymmetry (48). 

The cytadherence-associated proteins were originally defined 
by hemadsorption-negative mutants which had lost certain 
proteins like the so called high molecular weight proteins HMW 1 , 
HMW2 and HMW3, the adhesin PI and the proteins named A, 
B and C (2,28). B and C are most probably the gene products of 



the ORF6 gene of the PI operon (40 kDa protein = C. 90 kDj 
protein = B). The gene for A is still unknown. Another criterion 
for a putative protein of the cytoskeleton-like structure i s i u 
partitioning into the Triton X-100 insoluble fraction after treating 
M.pneumoniae with this detergent. This fraction is ill defined and 
comprises -50 proteins, of which only a subfraction is associated 
with the cytoskeleton and/or "cytadherence. The following 
proteins have been identified as most likely components of a 
cytoskeleton (2): HMW1 (H08_orfl018),HMW2(F10_orfl8lfr 
Krause, submitted), HMW3 (H08_orf672), P2(X) 
(D02_orfl.036o).(49), P65 (F10_orf405) (27). These proteins, 
with the exception of HMW2, share some common peculiar 
features, like an extended acidic proline rich domain and an 
abnormal migration in SDS-PAGE (49). The adhesin P 1 is mainly 
distributed in the membrane fraction and to a lesser extent in the 
Triton X-100 insoluble fraction (50). 

A large number of proposed ORFs contain sequences with high 
similarities to subregions of either the PI protein or the ORF6 
gene product of the PI operon. The coding DNA sequences : 
correspond to the repetitive DNA sequences RepMP2/3 (PI), 
RepMP4 (PI) and RepMP5 (ORF6). Preliminary experiments \ 
indicate that the proposed ORFs are not expressed under standard 
laboratory conditions. It has been observed that another indepen- 
dent isolate of M.pneumoniae, the strain FH, carries a different 
copy of RepMP2/3, RepMP4 and RepMP5 in its PI operon than 
the M.pneumoniae strain M129 which is the subject of this paper 
(5 1 ,52). All experimental data so far show that only the repetitive 
sequences which are part of the PI operon are expressed. The 
exchange of these copies presumably takes place by gene 
conversion as was indicated by DNA sequence analysis of the 
corresponding RepMP5 sequences in M.pneumoniae strains 
M 129 and FH. Different is the situation with RepMP 1 , copies of 
which seem to be part of several expressed proteins, 
RepMP 1 -specific antibodies recognize several proteins on west- 
em blots of M.pneumoniae protein extracts (26). 

Only little is known about cell division in M.pneumoniae. The 
lack of mutants, especially of conditional mutants, has prevented 
a detailed analysis. So far, the two proteins FtsZ and FtsH are 
classified as cell division proteins in analogy to their function in 
other bacteria (53). Other genes involved in chromosome 
partitioning or septum formation have not been identified in 
M.pneumoniae. Interesting problems to study might include the 
possible interaction of FtsZ with components of the cytoskeleton- 
like structure, which seems to play a key role in cell division, or 
the effects of cellular asymmetry on cell division and the 
formation of daughter cells. Other genes known to be involved in 
cell division in Exoli, the muk and min genes or additional fts 
genes were not found in M.pneumoniae (53). 

Lipoproteins 

Altogether 46 proteins were identified as lipoproteins based on the 
following characteristic lipoprotein-specific features (54): (i) one* 
more basic amino acids among the first 5-7 amino acids of the 
N-terminus, (ii) a hydrophobic signal peptide and (iii) a cysteine 
residue immediately downstream of the signal peptide, which* 
available for modification by the transfer of the diacylglyW 
moiety from glycerophospholipid to its sulfhydryl group. T<* 
precursor prolipoprotein with the modified cysteine is subsequent 
cleaved in M.pneumoniae by a specific signal peptidase (sign 31 
peptidase II). The modified cysteine will then be the first amino 
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acid of the processed protein. The cleavage site including the 
cysteine and the three (positions -3, -2 and -1) upstream located 
amino acids, is to some extent conserved (-3: 37xL, 6xF ( IxA, 
ixV; -2: 19xS, lOxA, 8XT, 6xV, 2x1; -1: 37xA, 7xS, lxG). 

The number of lipoproteins in M. pneumoniae is relatively high 
compared with the Gram-negative bacteria Kcoli and HJnfluenzae. 
Even in the closely related M.genitalium only 21 putative 
lipoproteins could be found by analyses of the published data (9). 

The lipoproteins of M.pneumoniae can be divided into six 
subgroups based on sequence similarities; also included in these 
groups are proteins with similarities to lipoproteins but without 
the lipoprotein signature at the N-terminal end. Quite a number 
of these proposed genes with high similarities are organized in 
tandem. For instance seven lipoproteins and one protein without 
the lipobox but with otherwise extended similarities are located 
between genome positions 249 627 and 256 463 (cosmid 
pcosMPE09). A gene family, with 13 proposed ORFs including 
five lipoproteins, is located between 306 862 and 320 524 
(cosmid pcosMPD02). Presently it is unclear whether all of the 
proposed genes are expressed. 

In vivo labelling of M.pneumoniae with 14 C-labelled palmitic 
acid and protein analysis by SDS-PAGE reveal, instead of the 
expected 46 lipoproteins, only between 20 and 25 lipoproteins 
(Pyrowolakis, unpublished data). This discrepancy could be 
explained either by a regulated expression which only allows 
some of the several tandemly organized lipoproteins to be 
synthesized or that the labelling with palmitic acid was not 
sensitive enough or that some lipoproteins carry fatty acids other 
than palmitic acid. Only four of all the proposed lipoproteins 
show significant similarities to other bacterial genes beside the 
ones from M.genitalium. These include A05_orf380V [high 
affinity transport system P37 with unknown specificity from 
Mycoplasma hyorhinis (55)], D09_orf384 (aerobic glycerol-3- 
phosphate dehydrogenase, glpD), H03_orf213 (uridine kinase) 
and D02__orf207 (ATP synthase b subunit (atpF). 

The processing of the prolipoprotein to the mature lipoprotein in 
Kcoli requires the three enzymes prolipoprotein diacylglyceryl 
transferase, prolipoprotein signal peptidase and apolipoprotein 
transacylase. We find in M.pneumoniae only the transferase which 
catalyzes the thioether linkage between the diacylglycerol and the 
cysteine and the peptidase which cleaves in front of the cysteine 
following the signal peptide. The transacylase could not be identified 
either in M.pneumoniae nor in M.genitalium (9). Therefore it is still 
an open question if a third fatty acid is linked to the cysteine by an 
amide bond as has been found for lipoproteins of Kcoli. 

The absence of a periplasmic space provides reasons for the 
existence of a large number of lipoproteins. For surface-exposed 
proteins which have to function on the outside, anchoring them via 
long chain fatty acids at the M.pneumoniae cell membrane is an 
efficient way. Already known examples are substrate-binding 
protein;, of transport systems or proteins possibly involved in 
antigenic variation for evasion of the immune system of the host, as 
has been shown for other mycoplasmas (56). Nothing is known 
about the fate of the cleaved signal peptides, as to whether they are 
degraded or recycled. 

Transport systems 

h light of the scarcity of metabolic pathways and the marked 
dependence on exogenous nutrients (Table 1 , Fig. 5), we expected 
^•pneumoniae to code for many transport systems to compensate 



for its inability to synthesize essential compounds like amino 
acids. Three different transport systems, mainly involved in 
import, were found in M.pneumoniae: (i) the ABC transporter 
system (57) consisting of two ATP-binding, two membrane-span- 
ning and one substrate-binding domain which are frequently 
present on separate polypeptides, but sometimes also consist of 
two or three different domains located on the same peptide 
(D12_orf634 or D12_orf623), (ii) the phosphoenolpymvate: 
carbohydrate phosphotransferase system (PTS), (58) and (iii) 
facilitated diffusion systems with transmembrane proteins func- 
tioning as specific carriers. Mycoplasma pneumoniae codes for 
43 genes involved in the above mentioned transport systems 
according to the present status of annotation. In addition, there are 
several proposed proteins with 6 or 12 transmembrane segments 
which are candidates for membrane-spanning domains of trans- 
port systems. The relatively low number of proteins listed in 
Table 1 indicates that at least some of the systems might not be 
very substrate specific, e.g. the transport systems for amino acids. 
Transport systems for histidine, glutamine, an ORF showing 
significant similarity to a probable aromatic amino acid permease 
from yeast and an ABC transport system for oligopeptides were 
identified based on similarity of the ATP-binding domains of 
ABC transporters. 

Surprisingly, we could not identify a transport system for the 
precursors for RNA and DNA synthesis, namely adenine, 
guanine, uracil and thymine which are essential components of 
mycoplasma growth media. 

In this context one has to be aware of the ambiguity in the 
identification of ABC transport proteins on the basis of sequence 
similarity of the ATP-binding proteins with respect to the 
predicted substrate to be transported, since database searches 
indicate numerous candidates with different specificities but with 
very similar, high score values. All the annotations in this paper 
were done on the basis of the highest score values. Therefore it 
might be possible that the predicted specificity disagrees with the 
in vivo activity in M.pneumoniae. Additional information from 
similarities to transmembrane domains or the substrate-binding 
proteins is only rarely at hand, since, in general, similarities 
among these domains are not well conserved. Even in positive 
examples, the score values are relatively low. Sometimes 
additional circumstantial evidence is derived from an operon-like 
organisation of the genes coding for ABC transporters, e.g. the 
unspecified ABC transporter consisting of the proteins P69, P29 
and P37 from nucleotide 519 560 to 523 050 (A05_orf542, 
A05_orf244 and A05_orf380V). A05_orf542 could act as the 
membrane-spanning domain, A05_orf244 as the ATP-binding 
domain and A05_orf380V, as a putative lipoprotein which could 
function as a substrate-binding protein. These proteins were also 
identified by their significant similarity to the corresponding 
genes in M.hyorhinis (55). 

In M.pneumoniae the ABC transport system for oligopeptides 
consists of two different transmembrane [G07_orf376 = amiD (= 
oppC in B.subtilis)\ G07_orf389a = oppB] and ATP-binding 
domains (G07_orf851 = oppF , G07_orf423 = oppD). It is also 
organized in an operon-like arrangement from nucleotide 750 865 
to 756 948. In striking contrast to B.subtilis, the substrate-binding 
domain (oppA) is absent in M.pneumoniae. Since an oppA 
homolog is also absent in M.genitalium a sequencing or 
annotation error seems unlikely. It remains to be experimentally 
determined whether the substrate-binding protein is dispensable 
or is part of one of the transmembrane or ATP-binding proteins. 
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Figure 5. Schematic diagram of the metabolic pathways of M.pneumoniae deduced from Table 1 . Shaded arrows with question marks indicate missing 
activities. 




K is also possible that one or more of the lipoproteins function as 
gjbstrate-binding proteins. 

|, There is also evidence for bacterial ABC export systems in 
^pneumoniae (59). For example D12_orf634 (msbA), 
pi2j>rf623 (pmdl) and D02_orf660 (lcnDR3) have the con- 
served ATP binding motif and the membrane-spanning domains 
the same polypeptide. In addition D12_orf623 and 
Dl2j>rf634 show also significant similarities to multidrug 
Resistance proteins of different organisms. 

, Among the proposed PTS transport systems, we identified one 
for glucose and one for mannitol. They are similar to the 
homologous systems from several Gram-positive bacteria, with 
a EIIA and EIIBC domains on two separate polypeptides for the 
mannitol transport system and with three domains (EIIABC) of 
enzyme II in one polypeptide for the glucose transport system. 

Besides glucose and mannitol, fructose also seems to be 
imported by the PTS system. According to our data the 
fructose-permease II component R02_orf694 (fruA) contains all 
three domains of enzyme II in one gene (EIIABC). In addition, 
R02_orf694 and the 1-phosphofructokinase (fruK, R02_orf300) 
are probably in one operon, but we do not find fruF which is also 
part of the fructose operon in enteric bacteria (58). 

Protein secretion 

Both, Gram-positive and Gram-negative bacteria have a well 
conserved protein translocation system. The components identi- 
fied which are part of the well characterized E.coli system (60) 
include cytosolic chaperones or regulators [trigger factor, SecB, 
DnaK, SRP (a ribonucleoprotein composed of 4.5 S RNA and 
Ffh) and FtsY] which deliver the protein to a membrane receptor 
(SecA). The receptor is also supposed to function as a motor, 
pushing the protein across the membrane via specific protein 
channels (SecY, SecG, SecE, SecP and SecF). The secreted 
proteins to be transported carry an N-terminal signal peptide 
which will be removed by a signal peptidase (SPasel). Two routes 
of export have been proposed either via SecB and SecA or by 
SRP. The protein secretion system in M. pneumoniae is less 
complex (Table 1). So far, the trigger factor, DnaK, SRP, FtsY and 
SecA have been identified. From the channel-forming proteins 
only SecY is present but SecG, SecF, SecE, SecD and the 
cytosolic receptor protein SecB are missing. Also absent is the 
signal peptidase SPasel although computer-assisted motif predic- 
tion programs indicate the presence of corresponding substrates 
(signal peptides). The simplified protein export system might be 
a reflection of the fact that M.pneumoniae is only surrounded by 
a cytoplasmic membrane. Another problem concerns refolding of 
secreted proteins which are normally exported in an unfolded 
stage. Refolding might be catalyzed by chaperones which have to 
function on the cell surface (60). This might impose a special 
Problem on the wall-less bacteria in general, since they do not 
Ppssess a periplasmic space which could prevent proteins from 
diffusing. To anchor the proposed chaperones on the cell surface 
*s lipoproteins would be a possible way to solve this problem. 

Nucleotide synthesis: purine and pyrimidine salvage 
Pathways 

Guanine, guanosine, uracil, thymine, thymidine, cytidine, ade- 
and adenosine may serve as precursors for nucleic acids and 
nucleotide coenzymes, as determined in nutritional studies of 
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Mollicutes. These components can be used for the synthesis of 
ribonucleotides by the salvage pathway as predicted from the 
enzymes listed (Table 1, Fig. 5). The ribonucleotides arc converted 
to deoxyribonucleotides by ribonucleoside-diphosphate reductase, 
an enzyme complex formed by the gene products of nrdE 
(F10_orf721) and nrdF (F10„orf339). Adenine, guanine and uracil 
can be metabolized directly to the corresponding nucleoside 
monophosphates by the enzymes adenine phosophoribosyl- 
transferase (apt, Fll_orfl33), hyrx>xanthine-guanine phosphoribo- 
syltransferase (hpt, K05_orfl75) and uracil phosphoribosyl- 
trarisferase (upp, B01_orfl78). Uridylate, adenylate and guanylate 
kinases catalyze the generation of ADP, GDP and UDR 
Surprisingly, we could not find the nucleoside diphosphate kinase 
(ndk), the key enzyme for the conversion from NDP to NTR This 
finding is in agreement with data from the genomic sequence 
analysis of M.genitalium. 

Another important enzyme, the CTP synthetase which converts 
UTP to CTP is also missing. Therefore the only route for the 
synthesis of CTP appears to be from cytidine to CMP by uridine 
kinase (H03_orf213) and to CDP by cytidylate kinase 
(P01_orf217). Deoxythymidine monophosphate (dTMP) could 
be either synthesized by thymidine kinase (tdk, B01_orfl91) or 
by thymidylate synthase (thA, F10_orf328). 

It will be of special interest to experimentally identify the 
enzyme(s) of M.pneumoniae which convert NDPs to NTPs, since 
such an enzymatic activity seems to be essential. 



Carbohydrate metabolism and energy conservation 

The ability to metabolize glucose and/or arginine and use it for the 
ATP synthesis is one of the key features in classification of 
Mollicutes. Mycoplasma pneumoniae is listed in Bergey's manual 
of systematic bacteriology as a glucose fermenter but not as an 
arginine-hydrolyzing species (61). This contrasts with our 
sequencing results, since the three enzymes involved in the 
arginine degradation pathway, arginine deiminase (H03_orf438), 
ornithine carbamoyltransferase (H10_orf273) and carbamate 
kinase (F10_orf309) are present according to our sequence data. 
The arginine deiminase gene occurs twice but one copy is inactive 
due to a raster-mutation resulting in two proposed ORFs 
(H10_orf 198 and H10_orf238) corresponding to the N-terminal 
and C-terminal halves of a complete deiminase. The change in 
reading frame was also confirmed by sequencing of directly 
amplified genomic DNA. All these proposed ORFs are organized 
in an operon-like arrangement except for the deiminase 
(H03_orf438) which seems to be expressed as a single gene 
located far away from the mentioned operon. Included in this 
operon is a proposed protein (F10_orf565) with. 12 predicted 
transmembrane domains indicative of a putative permease. 

Glucose, fructose and mannitol are transported by the PTS 
system into the cell and further degraded by the Embden-Meyer- 
hof-Parnas (EMP) pathway to pyruvate. All enzymes required 
for this pathway have been identified. The second pathway for 
metabolizing glucose, the pentose phosphate pathway, is incomplete 
in M.pneumoniae. We found only the enzymes ribulose-5- 
phosphate-3-epimerase and transketolase (Fig. 5). Glucose-6- 
phosphate dehydrogenase (G6Pde), 6-phospho-gluconate dehydro- 
genase (6PGde), and a transaldolase are missing. These data agree 
with enzymatic studies showing that G6Pde and 6PGde are absent 
in mycoplasmas (62). 
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Pyruvate can be further metabolized by two alternative reactions, 
either to lactate by lactate dehydrogenase (K05_orf312) or to 
acetyl-CoA by the pyruvate dehydrogenase complex and further 
to acetate by the phosphotransacetylase ( A05_orf320, pta) and the 
acetate kinase (G12_orf390, ackA). The pyruvate dehydrogenase 
complex consists of Ela (Fll_prf358a) Elp (Fll_orf327), the 
two subunits of the pyruvate dehydrogenase, the dihydrolipoamide 
acetyltransferase E2 (Fll_orf402) and the dihydrolipoamide 
dehydrogenase E3 (Fl l_orf457). The corresponding genes are 
clustered (nt 549 943-557 431; pcosMPFll); part of this cluster 
also contains the genes coding for NADH oxidase (nox, 
Fll_orf479) and lipoate protein ligase (IplA, Fllorf339). The 
later enzyme joins lipoic acid in an amide linkage to the e amino 
group of a lysine residue of the dihydrolipoamide acetyltransferase. 

Membrane phospho- and glycolipid synthesis 

In M.pneumoniae strain FH the following membrane phospho- 
and glycolipids have been found: digalactosyldiacylglycerol, 
trigalactosyldiacylglycerol, glucosylgalactosyldiacylglycerol, 
phosphatidylglycerol (PG) and diphosphatidylglycerol (DPG) 
(63). Since M.pneumoniae FH and M.pneumoniae M 129 are very 
similar we assume that both strains carry essentially the same 
genes for phospho- and glycolipid-synthesis. 

About 10 genes are required for the synthesis of the above- 
mentioned lipids; but according to our DNA sequence analysis 
only three of the expected genes could be unambiguously 
identified. They code (Fig. 5) for the enzymes l-acyigIycerol-3- 
phosphate acyltransferase (plsC; gene name in Saccharomyces 
cerevisiae is slcl), phosphatidic acid cytidyltransferase (cdsA) 
and glycerolphosphate phosphatidyltransferase (pgsA). These 
enzymes are involved in the biochemical pathway for the 
synthesis of PG and DPG. Missing are the gIycerol-3-phosphate 
acyltransferase (plsB) catalysing the synthesis of l-acylglycerol-3- 
phosphate (acyl-G3P) from glycerol-3-phosphate (G3P), the 
phosphatidylglycerol phosphate phosphatase which converts 
phosphatidylglycerol-3-phosphate to PG and finally thecardiolipin 
synthetase (els) which synthesizes DPG from PG. Interestingly, 
we find a gene homologous to the plsX gene from Kcoli which 
is involved in membrane lipid synthesis in an undefined manner. 
The glycolipid synthesis could start with phosphatidic acid and 
would probably require a phosphatidic acid phosphatase and 
several UDP-glucosyl- or galactosyltransferases. None of these 
enzymes could be identified by similarity searches in databases. 

As expected from biochemical studies no gene involved in fatty 
acid or cholesterol synthesis was determined in the sequence 
analysis. These components are incorporated as such from the 
medium. 

An interesting enzyme is the proposed carnitine palmitoyl- 
transferase encoded by C09_orf600, which might be involved in 
the modifacation of exogenous phosphatidylcholine (67). 

CONCLUSIONS 

It is impossible to address each proposed M.pneumoniae gene in 
this paper. We have tried to cover the most important categories 
of functions and point to genes which should be present, but could 
not be found by our applied methods. Typical examples are the 
missing diphosphonucleoside kinase for the conversion of 
(d)NDPs to (d)NTPs, and the substrate binding domain (oppA) 
for the oligopeptide ABC transporter. In addition, we could not 




find any indication for a number of genes/proteins, which 
be there based on experimental evidence. Mycoplasma p ne 
has been shown to be motile and to exhibit chemotactic beta - 
(64). Motility genes are difficult to identify since the motii^ 
M.pneumoniae is independent of pili or flagella and it i s 
known which are potential candidates. Therefore, any p r02 ^ 
this field depends on the isolation of mutants. Furthermore^! 
of the components of the chemotactic signal pathway, thecl 
proteins, which are well conserved among bacteria, or any ^ 
'two-component signal transduction system' could be det, 
Chemotactic behaviour in M.pneumoniae is difficult to st 
While it might be possible that these bacteria are chemot^ 
negative, only additional experiments will clarify this point *f 
It has been reported that M.pneumoniae produces hydro** 
peroxide considered to be a pathogenicity factor ( 1 7). ThereW 
to protect itself from oxidative stress one would expect to findtf 
standard enzymes dealing with these stress factors like caialaf 
superoxide dismutase or peroxidase, but we have no similarS 
based evidence that these enzymes exist in M.pnewnoiM 
Experimental data on this topic are also inconsistent (62). Js 
The results of our sequence analysis explain quite well the ki^ 
of changes which have led to the observed reduction of $ 
genome size in M.pneumoniae from the presumed genome si* 
of several million base pairs of the ancestral bacteria. The mai 
cause is the loss of complete anabolic (no amino acid synthesto 
and metabolic pathways and of genes for the synthesis of compki 
structures like the bacterial cell wall which requires a laif 
number of genes. In addition, for several processes like DNJS 
repair, DNA recombination, cell division or protein secretion, 1$ 
. number of genes involved is smaller than in the more compla 
bacteria. 

No significant changes were observed in the size of indivii 
genes which resemble more or less their counterparts in Ecolii 
B.subtilis. The occasionally observed smaller intergenic regioc^ 
like those found in the ATPase operon, do not appear jo 
significantly contribute to the overall genome size reduction. | 

In contrast with the loss of complete pathways we frequently 
observed the amplification of complete genes or segments of 
genes (see sections on lipoprotein families or on the repetimt 
DNA sequences RepMP2/3, RepMP4 and RepMP5). In these WO 
instances the obvious advantage would be the potential £ 
expressing antigenic variants of surface-exposed proteins. 

The various truncated genes which are also present in 
length copies e.g. arginine deiminase (H03_orf438 
H03_orf238), DNA primase (H91_orf620 and D12_orf212)ani e 
the dihydrofolate reductase (H10_orf506 and F10_orf 160) rnijfi 
be relics of recombination events which took place in the coufi^ 
of the process of evolution. 

Finally among the many proposed proteins are a few whii 
share the highest similarity over their entire length with* 
eukaryotic protein. The most prominent examples are the prc^ 
cell enhancing factor (pbeF, D09_orf451) and the carnitif* 
palmitoyltransferase II precursor (cpt2, C09_orf600). Both m$* 
be candidates for examples of horizontal gene transfer, but at the 
present state of analysis a definitive answer cannot be given. /. 

It will be the main task of future studies to reconcile tfc 
experimental evidence and the DNA sequence-based prediction* 
i.e. to indentify the genes for observed functions and vice vers* 
and to assign functions to proposed open reading frames *>* 
hitherto unknown functions. 
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One obvious topic is the comparative analysis between the 
completely sequenced genomes of the closely related species 
^.pneumoniae and M.genitalium (9). Since the present paper is 
jheady very voluminous we decided to publish this analysis in an 
additional paper (Himmelreich et al„ in preparation). 
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fciiius subtilis is the best-characterized member of the Gram-positive bacteria. Its genome of 4,214,810 base pairs 
pmprises 4,100 protein-coding genes. Of these protein-coding genes, 53% are represented once, while a quarter of 
& genome corresponds to several gene families that have been greatly expanded by gene duplication, the largest 
jntlly containing 77 putative ATP-binding transport proteins. In addition, a large proportion of the genetic capacity is 
^voted to the utilization of a variety of carbon sources, Including many plant-derived molecules. The Identification of 
^signal peptidase genes, as well as several genes for components of the secretion apparatus, Is important given the 
Opacity of Bacillus strains to secrete large amounts of Industrially important enzymes. Many of the genes are Involved 
ftfce synthesis of secondary metabolites, including antibiotics, that are more typically associated with Streptomyces 
Imcles: The genome contains at least ten prophages or remnants of prophages, indicating that bacteriophage 
' lon.has played an Important evolutionary role in horizontal gene transfer, in particular in the propagation of 
il pathogenesis. 
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Techniques for large-scale DNA sequencing have brought about a 
revolution in our perception of genomes. Together with our under- 
standing of intermediary metabolism, it is now realistic to envisage 
a time when it should be possible to provide an extensive chemical 
definition of many living organisms. During the past couple of 
years, the genome sequences of Haemophilus influenzae, 
Mycoplasma genitalium, Synechocystis PCC6803, Methanococcus 
jannaschu M. pneumoniae, Escherichia coli, Helicobacter pylori 
Archaeoglobus fulgidus and the yeast Saccharomyces cerevisiae have 
been published in their entirety 1 "", and at least 40 prokaryotic 
genomes are currently being sequenced. Regularly updated lists of 
genome sequencing projects are available at http://www.mcs.anl 
gov/home/gaasterl/genomes.html (Argonne National Laboratory 
Illinois, USA) and http://www.tigr.org (TIGR, Rockville, Maryland,' 
UoAj, 

The list of sequenced microorganisms does not currendy include 
a paradigm for Gram-positive bacteria, which are known to be 
important for the environment, medicine and industry. Bacillus 
subtilis has been chosen to fill this gap 9 - 10 as its biochemistry, 
physiology and genetics have been studied intensely for more 
than 40 years. B. subtilis is an aerobic, endospore-forming, rod- 
shaped bacterium commonly found in soil, water sources and in 
association with plants. B. subtilis and its close relatives are an 
important source of industrial enzymes (such as amylases and 
proteases), and much of the commercial interest in these bacteria 
arises from their capacity to secrete these enzymes at gram per litre 
concentrations. It has therefore been used for the study of protein 
secretion and for development as a host for the production of 
heterologous proteins". B. subtilis (natto) is also used in the 
production of Natto, a traditional Japanese dish of fermented 
soya beans. 

Under conditions of nutritional starvation, B. subtilis stops 
growing and initiates responses to restore growth by increasing 
metabolic diversity. These responses include the induction of 
motility and chemotaxis, and the production of macromolecular 
hydrolases (proteases and carbohydrases) and antibiotics. If these 
responses fail to re-establish growth, the cells are induced to form 
chemically, irradiation- and desiccation-resistant endospores. 
Sporulation involves a perturbation of the normal cell cycle and 
the differentiation of a binucleate cell into two cell types. The 
division of the cell into a smaller forespore and a larger mother cell, 
each with an entire copy of the chromosome, is the first morpho- 
logical indication of sporulation. The former is engulfed by the 
latter and differential expression of their respective genomes, 
coupled to a complex network of interconnected regulatory path- 



ways and developmental checkpoints, ciumin'ates , # 
grammed death and lysis of the mother cell and release 
mature spore . In an alternative developmental process tfifc 
also able to differentiate into a physiological state; the-2&£ 
state, that allows it to undergo genetic transformation'*-^ 

General features of the DNA sequence v- « 

^SZk thC reP ! i L COn leVeL The R su <> til " chromo'soml 
4,214,810 base pairs (bp), with the origin of replication co&c^ 
with the base numbering start point", and the terminus' '"iM 
2,017 lolobases (kb) 15 . The average G + C ratio is 43.5%M 
varies considerably throughout the chromosome. This avefaf 
also different if one considers the nucleotide content of rof 
sequences, for which G and A (24% and 30%) are relatively hidr. 
abundant than their counterparts C and T (20% and 26%W 
significant inversion of the relative G - C/G + C ratio is visible? 
the origin of replication, indicating asymmetry of the nucieot 
composition between the replication leading strand and the Iag& 
strand . Several A + T-rich islands are likely to reveal the sightl 
of bacteriophage lysogens or other inserted elements (Fie i ^ 
below). b '\:A 

We have analysed the abundance of oligonucleotides ('worasfij 
the genome in various ways: absolute number of words in' tf 
genomic text, or comparison with the expected count derived fron 
several models of the chromosome (for example, Markov models^ 
simulated sequences in which previously known features of tnl 
genome were conserved 17 ). Comparing the experimental data wit® 
vanous models allowed us to define under- and overrepresentatioT 
of words in the experimental data set by reference to the model 
chosen. In general, the dinudeotide bias follows closely what til 
been described for other prokaryotes"-", in that the dinudeotiof 
most overrepresented are AA, TT and GC, whereas those! 
represented are TA, AC and GT. Plots of the frequencies of XU 
V CT and TC In shdln g windows along the chromosome shW? 
dramatic decreases or increases around the origin and terrniriulf ? 
replication (data not shown). Trinucleotide frequency, dttectF 
related to the coding frame, will be discussed below. The distnbuf 
tion of words of four, five and six nucleotides shows significant 
correlations between the usage of some words and rephcatJolL 
(several such oligonucleotides are very significantly overrepresenteof! 
in one of the strands and underrepresented in the other one) ' ^ 
Setting a statistical cut-off for the significance of duplications^! 
10 , we expected duplication by chance of words longer than 24# 
nucleotides to be rare 31 . In fact, the genome of B. subtilis contact- 
plethora of such duplications, some of them appearing more manfj 
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2034 pectate lyase 
3688 mannose-6-phosphate isomerase • 
2053 phosphoenolpyruvate synthase 
3865 phosphotransacetylase 
1459 histidine-containing phosphocarrier protein of the 
™, Phosphotransferase system (PTS) (HPr protein) 
3701 nbokmasefribose metabolism) 
3902 sucrase-6-phosphate hydrolase 
3535 levansucrase 
2759 levanase 

3941 negative regulatory protein of SacY 
851 trehatose-6-phosphate hydrolase 

S^ oskJase/a * t " arabinosidase(xy,ande9rada * 

189! xylose isomerase (xylose metabolism) 
1893 xylulose kinase {xylose metabolism) 
2054 endo- 14-xyianase (xylan degradation) 
888 xylan B-14-xyiosidase (xylan degradation) 
1945 endo-1.4-xylanase (xylan degradation) 
lei polysaccharide deacetylase 
^-hexosaminidase 

pjj^a^ne-fructose-e-phosphateaminotrans- 

giucosamine-6-phosphate isomerase 
5-dehydrr>4-deoxygiucarate dehydratase 
aldehyde dehydrogenase 
glucarate dehydratase 
glucose 1 -dehydrogenase 
oligc-te-gtucosidase 
aromatic hydrocarbon catabolism 
Jfv B-glucosidase 

375 ^T n ° 3 * hexutose ^Phosphate formaldehyde 

aryl-aicohol dehydrogenase * ^ 

atcohol dehydrogenase 
aceryttransferase 
cellulose synthase 
pyruvate oxidase 
p-glucosidase 
fructokinase 

mannose-6-phosphate isomerase 
mannan endo- 1.4-mannosidase 
fructokinase 
L-iditol 2-dehydr'ogenase 
arylesterase 

methanol dehydrogenase regulation 
rhamnogalacturonan acetyfesterase 
B-gaiactosidase 
epoxide hydrolase 
glucose 1 -dehydrogenase 
«w polysaccharide deacetylase 
807 benzaidehyde dehydrogenase 
798 glucose- 1 -phosphate cytidyfyliransferase 
958 reticuline oxidase 
997 phosphogfycoiate phosphatase 
1022 glucose l -dehydrogenase 
1030 aldo/keto reductase 
1041 endo- u-xyt ana se 
1095 glucanase 
1006 phosphomannomutase 
U15 alcohol dehydrogenase 
1 1 18 ribito) dehydrogenase 
1164 myo-inosito!2-dehydrogenase 
1)75 mandelate racemase 
1192 c-gulonolactone oxydase 
1274 mannose-6-phosphate isomerase 
1281 endc-i.4-xylanase 
1285 formate dehydrogenase 



183 
213 

258 
268 
269 
272 
305 
306 
352 
370 



466 
471 
473 
482 
488 
628 
631 
632 
632 
670 
679 

682 

688 

774 

774 

929 

937 

869 



1300 giucuronate isomerase 

1304 sorbitol dehydrogenase 

1305 D-mannonate hydrolase 
JKS f^^^uconata dehydrogenase 
1309 tagaturonate reductase 
1311 aitronate hydrolase 
nS ^''chol phosphate mannose synthase 

366 chtoromuconatecyctoisomerase 

367 potysugar degrading enzyme 
M03 dolichoi phosphate mannose synthase 
M27 nrjulose-bisphosphate carboxylase 

^'"ositot- 1 (or 4HTronophospnatase 
M77 glucose 1 -dehydrogenase 
]"l gJucoso 1 -dehydrogenase 
1445 chmnase 

1653 noutose-S^hosprwte^epirnerase 
1741 deacetylase 
1943 endo-xylanase 
1951 pfopionyf-CoA carboxylase 
2023 xyfulokinase 

isis sra«dr ,oflenase 

2507 rtephoeno/pyruvate mutase 
2488 prop.onyK)oAcart)oxy(ase " 
2780 formate dehydrogenase 
2780 formate dehydrogenase 
2778 methyttransferase 
2768 cyctodextrin metabolism 
2742 sugar-phosphate dehydrogenase 
2950 endo- 1,4— glucanase 
2932 glycolate oxidase subunit 
2934 glycolate oxidase subunit 

^metabolite dehydrogenase 

3156 NDP-sugarepimerase 
3024 acetate-CoAligase 

VS ^'^e-lijhosphateuridytyttransferase 
3138 carbonic anhydrase 
3055 endo-Vt-giucanase 
2989 aceryl-CoA carboxylase 

JJy^olipoamide S-acetyltransferase 
S^^'bwarK.1 dehydrogenase 
v£ ^^pendent butanol dehydrogenase 
3Z15 exc—f,4^giucosfdase 
3200 rhamnulokinase 
3198 L-rhamnose isomerase 
3382 retinol dehydrogenase 
3318 rV-acetyt-glucosamine catabolism 

3568 AfhydroxyaryiamineO-acetyftransferase 
3562 glycerate dehydrogenase 
3561 carbonic anhydrase 
3557 glucan 1,4-maltohydroiase 
3548 oligo-1.6-g!ucasidase 
3547 p-phosphoglucomutase 
3537 levanase 

SI ss^"^vh*«-*« 

3495 glycolate oxidase 
3427 plant-metabolite dehydrogenase 
3615 pyruvatfi.waterdikinase 
3592 Phosphogrycolate phosphatase 
3591 O-aceryitransferase 
3590 pectate fyase 

3664 UDP-rV j aceiylglucosamine2-epimerase 
3895 aldehyde dehydrogenase 
3872 glucose dehydrogenase 
3305 glycerol-inducible protein 
3730 NDP-sugar dehydrogenase 
4091 glucose 1 -dehydrogenase 
4040 arabinanendo-15-L-arabinosidase 
4000 gluconate 5-dehydrogenase 
4107 glucose 1 -dehydrogenase 
4202 formate dehydrogenase 
4196 galactosideacetyftransferase 
4136 formaldehyde dehydrogenase 

MAIN GLYCOLYTIC PATHWAYS ' ?fl 
3477 enolase (glycolysis) " 
3808 fructose- 16-bisphosphate aldolase (glycolysis) 

^g de ^ 3 -P^ s P h atedehydrc^9nase(g^ 
2967 ^Jj^^^^^'^e^ogenasefgly- 
4CTO fructose- 16-bisphosphate aldolase (gfycofysis) 
3129 Phosphoenolpyruvatecarboxykinase 
528 pyruvate deh y a;&genase {E 1 a subuniti 
^>_,S ^uv^^hydrogenasefEipsubunit)' 
P«<=U -153Q pyruvate dehydrogenase (dihydrolipoamide 

acet y ,tran sferaseE2 subunit) 
PdhD 1531 pyruvate dehydrogenase / 2-oxogiutara;e dehy- 
drogenase (dihydrolipoamide dehydrogenase E3 
Subunit) 

2987 e^hosphofructokmasetgfycolysis) 
J22i giucose-6-pnosphate isomerase (glycolysis) 
£S S^ c ^te kinase (glycor/sis) ^ ' 
3478 pnosphogiycerate mutase (glycotysis) 
1554 pyruvate carboxylase 
2986 pyTuv<jte kinase (glycolysis) 
1919 transketoiase (pentose phosphate) 
X« ^P ho fP^»isomerBse(gfycolysis) 
198 phosphoglucomutase (gfycofysis) 

^^^^o^tedertydrc^enasefgiy- 

pIS P^ sp ^^ ratem «taM (glycolysis) 
2651 6-phosphogiuconate dehydrogenase (pentose 

pnospnate) 
IfS ^ in y c,rol 'Poarnide dehydrogenase 

^ p ^ t ^ luM ^ te ^r^ro6^ase(r^^ 

2478 pho^ e ^ h ^ a '° , " <,ehydr09enase (pentose 



yjmA 
yjmD 
yjmE 
yjmF 
yjmi 
yjmj 
ykcC 
ykfB 
ykfC 
ykoT 
ykrW 
yktC 
ykuF 
ykvO 
ykvQ 
yfoR 
ytxY 
ynfF 
yngE 
yoaC 
yoaD 
yoaE 
yoal 
yogA 
yQiO 
yqjD 
yrfiE 
yrtiG 
yrhH 
yrtiO 
yrpG 
ysdC 
ysfC 
ysfO 
ytbE 
ytcA 
yxcB 
yxcl 
ytdA 
ytiB 
ytoP 
ynl 
yugF 
y"9J 
yugK 
yugr 
yu/C 
yulE 
yusZ 
yvtF 
yuxG 
yvaM 
yvcN 
yvcT 
yvdA 
yvdF 
yvdL 
yvdM 
yveB 
yvfO 
yvfO 
yvfV 
yvgN 
yvkC 
yvoE 
yvoF 
yvpA 
yvyH 
ywdH 
ywfD 
y*ji 
y*qF 
yxbG 
yxiA 
yxjF 
yxnA 
yyaE 
yyaf 
yycR 



mmgD 

odhA 

odhB 

sdhA 
sdhB 
sdhC 

sucC 

sucD 

yjmC 

yqkJ 

ytsJ 

ywkA 

IL2 



11.1.2 

eno 

fbaA 

ftp 

gap 

gapB 



ioU 

pckA 

pdhA 



pfk 
pgi 

PG< 
pgm 
pycA 
pykA 
ikt 
tpi 

ytbT 
yOeA 

yhfR 

yvec 

yq:V 



giyA 

NsA 

ht'sB 
hisC 



hisD 
hisF 



hisH 



2510 citrate synthase 111 

1303 m a ate dehydrogenase 

2452 mala te dehydrogenase 

2990 malate dehydrogenase 

3801 malate deriydrogenase 

3277 L-3laninelie'hydrogenase'"" r 205 

1516 aminopeptidase 

2456 (.-asparaginase 

2455 L-aspartase 

b^s^ a ^ araf ^ Mfa ^ 

2142 acerylrxniminedeacetytasefarainineb^^ 
12a3 ^^ e ^ arT ^^^^nebio^ 
3013 ^^osuccinatBSyntha^a^nebk^rhe. 

1196 omithTOacetyitransfefase/an^ 
transferase (arginine biosynthesis) 
™^^ bino ^ eptulosonate 7-phospha:e 

^ehydroquinate synthase (shikimate pathway) 
2413 3-dehydrw*™ 

2645 shikimateWef.ydrc^setshikirnatepatrwva^^ 

2368 US^T^ m « x 

2380 chonsmate synthase (shikimate pathway) 
2377 ^or,smatemutase(isozymesland2)(aromatic 
ammo acids biosynthesis) " K 

340 sh!kimate kinase (shikimate pathway) 
it ^^T sem i aWe ^ edeh y dfD 9enase 
2910 aspanokmasellanenuaajr 
3127 asparagine synthetase ■ 
4098 asparagine synthetase 
2348 aspartate aminotransferase' 

metSsT ,C ^^^ e(P ^ aIanine 
bfmBAA 2499 branched^hain a-keto add dehydrogenase El 
bfmBAB 2493 KSS"??" 1 deh yd^09enasV asVbunit) 
ormtJAB 2493 branched^am a-keto acid dehydrogenase El 

2497 branche^ha.n a-keto acid dehydrogenase E2 

subunit (lipoamideacyttransferase) 
7™ ^ per ( mine/ spermidine acetyftransferase 
1599 baciilopeptidaseF 
1535 lysine decarboxylase 

2133 cartxwjMerminal processing protease 
Sn ^ e «*J*™n?«te«e (cysteine biosynthes^) 
1630 P^sphoadenos.nephosphosuKatereductas 
(cysteine biosynthesis) 

??7 S ines ^ metaseA tcys'einebiosynthesis) 
oi7 c-aianme racemase 
1748 dihydrodipicolinate synthase 

Lih a ^? ime ! ate/,ysine ^osyntnesis) 
2359 dihydrodipicolinate reductase 

(dtammopimelate/Jysine biosynthesis) 
SI as P ano,tin aseI(aandpsubunits) 
1646 polypeptide deformyiase 
3939 minor extracellular serine protease 

,ef a s am ' ne ^ frirct05e *P nos P h ateemidotrans- 
1873 glutamine synthetase 

2009 bio?ynS s 7^ Se(SmaaSU ^^^ 

3789 fni)?h ^^^^^nsferasetgfyci'ne/ser- 
<ne/threontne metabolism) 

3534 Phosphoribosyllormimino-s-aminoimidazole car- 
thesisT •sornerase (histidine biosyrv 

2371 h.sydinol- P ho$ P heie aminotransferase (histkJire 
biosynthesis) / tyrosine and phenylalanine 
aminotransferase 

3583 !iS?;ci d ^ 

'rnrdazoleotycerolphosDharel 



aid 

amps 

ansA 

ansB 

BprE 

aprX 

orgB 

ergC 

8rgD 

argE 

ergF 

argG 

argH 
argj 

aroA 



aroB 
aroC 

aroD 
aroE 

aroF 
aroH 

arot 

asd 

ask 

asnB 

asnH 

aspB 

bcsA 



bfmBB 

bltO 
.. bpr 
cad 
carA 

carB 

ctpA 
cysE 
cysH 

cysK 
dat 
dapA 

dapB 

dapG 
def 
epr 
gtmS 

gtnA 
gttA 



yoj) 

y\\;H 
ywiF 



H' 'v&pfiaiej 
3807 transaldoiase (pentose phosphate) 
3791 nboseSiihosphateeptmerasefper 
phate) 



IU3 
CslA 
oiB 
ore 
cilG 
OtH 
OtZ 
maiS 



phate) 
TCA CYCLE. 



sntosephos- 



1021 citrate synthase i 
1926 aconiiatehyaratase 
5™ ^o^ate dehydrogenase 
3389 fumarate hydratase 
2979 maiate dehydrogenase 
2981 citrate synthase II 
3058 malate dehydrogenase 



horn 

hutG 

hutH 
hutf 

hutU 

itvA 

itvB 

ifcC 

ifvO 

ifvN 



'mtrjazole glycerol phosphate) 
^ T e 5 ( ^ osp ^ ri ^^^ an s^^(histidinebiosyn- 

$S am ^°i: an ? fer ase (histidine biosynthesis) 

^!5^ s ^ AMP ^°^oia M /pnosph^ 

33,5 S^ s r r ^ 

4045 tor^hogiuiamate hydrolase (WsMto 

4041 histidase (histidine utidzafion) ■ 

4044 i^^° n ^^^^^(^ine^ 

4042 urocanase(histidrne utilization) 

(vaiine/isoievranebiosyhthesis) . 
o^y^^^^^^'^ne 

2302 SSs^' 6 ^ 

2894 acetolactate synthase (small subunit) 



flgE 1700 flace'iar noo* protein 
to* 3639 flageiiarhcok-associe:edpraeml(HAPl) , 
tot 3637 flagellar hook-associated protein 3 (HAP3) 
tlQM 3640 flagellin synthesis regulatory protein (anti-sigma 
(actor [o^) 

AM 1707 ftagella-asscoated protein 

flhS 1706 flagella-associated protein 

flnf 1709 flagella-associated protein 

fihO 3746 flagellar basaHaody rod protein 

RhP 3745 flagellar hook-basal body protein 

fltD 3633 flagellar hook-associated protein 2 (HAP2) 

1692 flagellar hook-basal body protein 
fliF 1692 flagellar basal-body M-ring protein 
MG 1694 flagellar motor switch protein 
ffiH 1696 flagellar assembly protein 
Gil 1695 flageflar-speciflc ATP synthase 
flu* 1697 flagellar protein retired for formation of basal 
body 

m 1696 flagellar hook-length control 

tliL 1701 flagellar protein required for flagellar formation 

tliM 1701 flagellar motor switch protein 

ftp 1704 flagellar protein required for flagetlar formation 

/I/O 1706 flagellar protein required for flagellar formation 

WR ■ 1705 flagellar protein required for flagellar formation 

ffiS 3632 flagellar protein 

ftT 3632 flagellar protein 

fliY 1702 flagellar motor switch protein 

ff2 1704 flageHarproteinrequiredforflagenartormation 

nag 3635 flagellin protein 

mcpA 3207 methyl-accepting chemotaxis protein (glucose 

and o-nwthyVglucoside) 
mcpB 3212 methyl-accepting chemotaxis protein 
(asparagine, gtutamine and histidine) 
mcpC 1463 methyl-accepting chemotaxis protein (cysteine. 

proline, threonine, glycine, serine, lysine, valine 
and argirune) 
1435 motility protein (flagellar motor rotation) 
1434 motility protein (flagellar motor rotation) 
3209 rnethyVaceepting chemotaxis protein 
3205 methyl-accepting chemotaxis protein 
374 methyl-accepting chemotaxis protein 
808 methyt-eccepting chemotaxis protein 
1 1 13 methyl-accepting chemotaxis protein 
1679 flagellar biosymhetic protein 
1699 flagellar hook assembly protein 
1710 flagellar biosynthesis switch protein 
2030 methyl-accepting chemotaxis protein 
3043 flagellar motor apparatus 
3042 motility protein 

3457 transmembrane receptor taxis protein 
3634 flagellar protein 
3640 flagellar protein 
3639 flagellar protein 
3609 flagellin 

PROTEIN SECRETION „ 



motA 
motB 
VpA 
tlpB 
VpC 
yfmS 
yhN 
ykjH 
yi*G 
ytxH 
yoaH 
yuD 
ytxE 
yvaO 
yvyC 
yvyF 
yvyG 
yvzB 

1.6 
csaA 
ffh 
ftsY 
Isp 
tytA 
prsA 
secA 
secE 
secF 

secY 

sipS 
sipT 
sipU 
sipV 

SJpW 

yaaT 
yacD 
yobE 

1.7 

dMB 

dMC 

dMVA 

fisA 

fisE 

ftsH 

ttsL 

ftsX 

itsZ 

gid 

gidA 

gidB 

mat 

minC 

minD 

yacA 
yfhF 
yjoB 
ytaO 
ylmH 
ywcF 

1.8 

boiA 
bofC 
cgeA 
cgeB 
cgeC 
cgeD 
cgeE 
cotA 
corS 
core 
cofD 
corf 
cctF 
corG 
cotH 
coVA 
ooUB 
ooVC 
cotK 
CotL 
COtM 
COtN 

corS 
corT 
con/ 
cortv 



2079 chaperonin involved in protein secretion 
1672 signal recognition particle 
1670 signal recognition particle 
1616 signal peptidase II 
3662 secretion of major autorysin LytC 
1071 protein secretion (post-translocation chaperonin) 
3630 preprotein transtocase subunit 
118 preprotein transtocase subunit 
2828 protein-export membrane protein (product also 

similar to SecD of E. coft) 
145 preprotein translocase subunit 
2432 signal peptidase I 
1511 signal peptidase I 
454 signal peptidase I 
1122 signal peptidase I 
2554 signal peptidase 1 
42 signal peptidase II 
81 protein secretion PrsA homotogue 
2057 general secretion pathway protein 



Mi - sic re ica: r. (;nso-oD.e tracumj 

1250 soceccatorote^finsoiLbie fraction) sspE 
i2-9 soo r e coat protein (insoluble fraction) 

228 SDOfulatiorvspecificSASP protein SSpi- 
4Z:3 SootiiJ-associated protein 

3230 activator of <mB in the initiation of sponjlaiion usd 

3232 inhibitor of the KinA pathway to sporulation yknT 

1 59 activation of the Kin8 signaling pathway to sporu- ykvU 

laton ynzH 

2353 GTP-btndingproiein involved in initiation of sporu- yObW 

iaton(SooOA activation) VQQj 

1316 onosohatase(RapA) inhibitor (imported by Opp) yqjG 

430 Dnosphatase (RapC) regulator / competence and yraD 

soonjiation stimulating factor (CSF) Y Ta ^ 

2660 Dhosohatase(RapE) regulator yraF 

33^6 onospnatase (RapF) regulator yraG 

4U! pnospnatase(RapG)regulator yroA 

543 Dho$phatase<Rapl)regulator yrbB 

2063 pnospnatase(RapK) regulator yrbC 

1315 resoonse regulator aspartate phosphatase ytaA 

(SooOF-P] #9% 

377i resoonse regulator aspartato phosphatase ytpi 

[Sdoof-p] yyoA 

428 response regulator aspartate phosphatase 
3743 response regulator aspartate phosphatase L9 
2558 response regulator aspartate phosphatase gerAA 
3845 response regulator aspartate phosphatase gerAB 
4139 response regulator aspartate phosphatase gerAC 
750 response regulator aspartate phosphatase gerBA 
547 response regulator aspartate phosphatase 
304 response regulator aspartate phosphatase gerBB 
2061 response regulator aspartate phosphatase 
2552 antagonistofSinR gerBC 
4206 centromere-like function involved in forespore • ■ 

enromosome partitioning / inhibition of SpoOA gerCA 
activation . 

1461 spore photoproduct lyase gerCB 
2423 spore maturation protein {spore core dehydrata- 

uon) gerCC 
2422 soore maturation protein (spore core dehydrata- 

ticn) - gerD 

2854 sporulation initiation pnosphoproteinjpart of 

pnosphorelay. Spo0F-F^>Spo0B-Mpof^-P) 
1430 negative sporulation regulatory phospnwa9ft_ .3 m gerKA 
ISpoOA-P] 

4206 chromosome positioning near the pote and trans- gerKB 
pon through the polar septum / antagonist of Soj 
spoliAA 2444 anti-anti-sigma factor [SpollAB] gerKC 
spollA3 2444 ar.ti-sigma factor [o : (SpoliAC)) and serine kinase 

[SpoliAA) gerM 
spoliB 2864 endospore development (oligosporogenous 

mutation) Opr 
spoltD 3777 required lor complete dissolution of the asymmet- sle8 
nc septum yfkQ 
spoiiE 71 serine phosphatase [SpollAA-P] (a* activation)/ yfkR 
asymmetric septum formation yfkT 
spoliGA 1603 protease (processing of pronto active <r] ykvT 
spoilt AA 2537 mutants block sporulation after engulfment yndD 
spolltA3 2536 mutants block sporulation after engutfment yndE 
spoMAC 2535 mutants block sporulation after engulfment yndF 
spolltAD 2535 mutants block sporulation after engulfment 
spotltAE 2535 mutants block sporulation after engutfment L10 
spoWAF 2534 mutants block sporulation after engulfment cinA 
spottlAG 2533 mutants block sporulation after engulfment corr,C 
spoMAH 2532 mutants block sporulation after engulfment 
spolHE 1752 DNA translocase required for chromosome parti- 
tioning through the septum into the forespore 
4214 essential for activity at stage III 
2450 required for dissolution of the septal ceil wall 
2634 required for dissolution of the septal cell wall 
3760 required for completion of engutfment 

^ 3794 required lor processing ol pro-t^ 

spoilSA 1349 tetnalvKhen synthesized during vegetative growth 

in the absence of SpollSB 
spoliSB 1348 disruption blocks sporulation after septum forma-, 



COl'. 
CCsV 

coiZ 
'csgA 
jag 
kapB 
kapD 
kbaA 

obg 

phrA 
phtC 

phrE 
phrr 
ptvG 
phri 
phrK 
rapA 

rapB 

rapC 
rapD 
rapE 
rapF 
rapG 
rapH 
rapt 
rap) 
rapK 
s'mt 
soj 



spiB 
spmA 

spmB 

spoOB 

spoOE 

SpoQJ 



937 sma.t acc-scuse spore protem (major rf/pe 

SAS=) t t 

53 small ac ; a-sox.bie spore prote* (minor o/B-type 

SASP) 

3748 required fcr v ansiation of spotUD 

1495 sporulation prote:n o f -controaed 

1449 spore conex membrane protein 

19C1 spore coa:? r 3tein 

2083 memb'are yctein o*-contro3ed 

2568 yo-gtutamyi-.-o amino acid endopeptidase I 

2483 hpoprot&n SpoilU-like 

2754 spore coa: orcein 

2754 spore coat p'ctein 

2752 spore coat prciein 

2752 spore coat protem 

2845 spore coat protein 

2844 spore coat protein 

2843 spore coat cotein 

3161 spore coat orotein 

3074 spore core* orotein 

3051 DNA transtocase stage 111 sporulation protem 

4208 DNA-binoing orotein SpoQWike 

GERMINATION — -23 



3390 germination response to L-eianine 

3391 germination response to L-alanine 

3392 germination response to L-alanine 
3683 germination response to the combination of glu- 
cose, fructose, c-asparagine, and KO 

3689 germination response to the combination of glu- 
cose, fructose. L-esparagine, and KCI 

3690 germination response to the combination of gtu- 
- cose, fruaose^-asrjaragine, and KCI 

2384 heptapreryHC:prxjsr>hatesynmaseccjnponentl 

(menaquincne biosynthesis) 
2383 menaquinone oiosynthesis meihyltransferase 

(menaquincne biosynthesis) 
2382 heptaprenyi a:phosphaie synthase component II 

(menaquinone biosynthesis) 
159 germinafon response to L-alanine and to the 

combinat on of glucose, fructose. L-asparagine. 

andKO p . 

420 germination response to the combination of glu- 
cose, fructose. ..-asparagine. and KG 

423 germination response to the combination of glu- 
cose, fructose, L-asparagine. and KO 

421 germination, response to the combination of glu- 
cose, fructcse. ■.-asparagine. and KCI 

2902 germination (cortex hydrolysis) and sporulation 

(stage H. muttpte polar septa) 
2635 spore protease (degradation of SASPs) 
2399 spore conex-lytie enzyme 
850 spore germination response 
848 spore germination protein 
847 spore germination protein 
1448 spore conex-lytic enzyme 

1907 spore germination protein 

1908 spore germination protein 

1909 spore germ.nat"ion protein 

TRANSFORMATION /COMPETENCE 20 



SpoItU 
spolM 
spoiiP 
spollO 

SpOlln 



tion 



CELL DIVISION .. 



__ „ 21 

1593* celW'ivision initiation protein (septum formation) 
69 eel (-division initiation protein (septum formation) 
1612 cell-drvision initiation protein (septum placement) 

1596 cell-drvision protein (septum formation) 
3625 cell-division ATP-binding protein 

77 cell-drvision protein / genera! stress protein (class 

til heat-shock) 
1581 cell-division protein (septum formation) 
3624 cell-drvision protein 

1597 cell-division initiation protein (septum formation) 
1685 glucose-inhibited division protein 

421 1 glucose-inhibited division protein 
4209 glucose-inhibited division protein 
2862 septum formation 
2859 celt-division inhibitor (septum placement) 
2858 cell-division inhibitor (septum placement) 

(ATPase activator of MinC) 
75 cell-cycle protein 
925 cell-drvision inhibitor 
1314 cell-division protein FisH homotogue 
1552 celkJivision protein 
161 1 cell-division protein 
3912 cell-division protein 



SPORULATION 139 

30 inhibitor of the pro-o* processing machinery 
2837 forespore regulator of the o* checkpoint 
2148 maturation of the outermost layer of the spore 
2148 maturation of the outermost layer of the spore 
2148 maturation of the outermost layer of the spore 
2147 maturation of the outermost layer of the spore 
2146 maturation of the outermost layer of the spore 
685 spore coat protein (.outer) 

3715 spore coat protein (outer) 
1905 spore coat protein (outer) 
2332 spore coat protein (inner) 
1774 spore coat protein (outer) 
4166 spore coat protein 

3716 spore coat protein 

37 16 spore coat protein (inner) 

755 polypeptide composition of the spore coat 

756 polypeptide composition of the spore coat 
756 polypeptide composition ol the spore coat 
1926 spore coat protein 

1926 spore coat protein 

1925 spore coat protein (outer) 

2553 spore coat-associated protein 

3160 spore coat protein 

1280 spore coat protein (inner) 

1251 spore coat protein (insoluble fraction) 

1251 spore coat protein (insoluble fraction) 



spoSVA 2387 required for proper spore cortex formation and 
coat assembly 

spoP/3 2520 iniercompartmentat signalling ol pro-o* process- 
ing/activation in the mother-cell 

spoiVCA 2654 site-spec'lic DNA recombinase required for creat- 
ing the sigK gene (excision of the skin element) 

spcrSFA 2857 inhibitor ol SpolVFB 

spoiVFB 2856 protease (processing of pro-o* to active o ) 

spoVAA 2443 mutants lead to the production of immature 
spores 

spoVAS 2442 mutants lead to the production of immature 
spores 

spoVAC 244 1 mutants lead to the production of immature 
spores 

spoVAD 244 1 mutants lead to the production of immature 
spores 

spoVAE 2440 mutants lead to the production of immature 
spores 

spoVAF 2439 mutants lead to the production of immature 
spores 

spoVB 2829 involved in spore conex synthesis 
spoVC 60 tnermosens'mve mutant blocks spore coat forma- 
tion 

spoVE 1590 required for spore cortex synthesis 
spoVFA 1744 d'picoiinate synthase subunit A 
spoVF3 1745 dipicoiinate synthase subunit B 
spoVC 56 required (or spore cortex synthesis 
spoVtD 2872 required for assembly of the spore coat 
spoVK 1873 disruption leads to the production of immature 
spores 

spoVM 1655 required for normal spore cortex and coat synthe- 
sis 

spoVR 1015 involved in spore cortex synthesis 

spoVS 1 769 required for dehydratation ot the spore core end 

assembly of the coat 
spsA 3892 spore coat polysaccharide synthesis 
spsB 3891 soore cocit polysaccharide synthesis 
spsC 3890 score coat polysaccharide synthesis 
spsD 3889 spo^e coat polysaccharide synthesis 
spsE 3888 spore coat polysaccharide synthesis 
spsF 3887 spore coat polysaccharide synthesis 
spsC 3S86 spore coat polysaccharide synthesis 
spsi 3885 spore coat polysaccharide synthesis 
spsJ 3884 soore coat polysaccharide synthesis 
spsK 3883 spore coat polysaccharide synthesis 
sspA 3025 sman acid-soluble spore protein (major a-type 

SASP) 

s$pB 1050 small acid-soluble spore protein (major B-type 
SASP) 

sspC 2156 sman acid-soluble spore protein (minor a/B-type 
SASP) 

sspD 1413 small aod-soluble spore protein (minor a/B-type 



1763 competence-damage inducible protein 
2864 late competence protein required for processing 
and translocation of ComGC 
com£A 2640 late compe:er*ce operon required for DNA bind- 
ing and ucta'^e 

corr.EB 2640 lata compe^ice operon required for DNA bind- 
ing and uptake 

comEC 2639 late competence operon required for DNA bind- 
ing and uDta<e 
corr.ER 2640 non-essar.;;a! gene for competence 
comFA 3643 late comc-e'.ence protein required for DNA uptake 
comFB 3641 late competence gene 
comFC 3641 late competence gene 
comGA 2559 late competence gene 
comCB 2558 DNA transoon: machinery 
comGC 2557 exogenous ONA-binding 
comGD 2557 DNA transport machinery 
comGE 2557 DNA transport machinery 
comGF 2556 DNA transport machinery 
comGG 2556 DNA transport machinery 
comS 390 a ssembly tok between regulatory components of 

the competence signal transduction pathway 
comX 3255 competence oheromone precursor (activation of 
ComA| 

mecA 1229 negative 'emulator of competence 
ypt>H 2403 negative recitation of competence MecA homo- 
togue 

U 1 METABOLISM 0 C CARBOHYDRATES AND RELATED 

MOLECULES - 26 : 

11.1 SPEOFtCPATK-VAYS -2 U 

abfA 2939 a-L-arabtnoluranosidase 

abnA 2949 arabman-trao 1.5-L-arabinase (degradation of 

plant ce^ /.as polysaccharide) 
acKA X15 acetate <:-ase 

bcoA 879 acetoin oerydrogenase E 1 component (TPP- 

dependent a subunit) 
acoB 880 acetoin dehydrogenase El component (TPP- 

deper-.oertp subunit) 
ecoC 881 acetoin cer.ydrogenase E2 component (dihy- 

drotipoanr ce acetyttransfsrase) 
acoi 882 aceto ; n - ydrogenase E3 component (dihy- 

drolipoaT: 32 dehydrogenase) 
bcsA 3039 acetyi-Co A synthetase 
acuA 3039 acetoin nation 
acuB 3040 acetoin L"Jzation 
acuC 3040 acetoin ic: 2ation 
adM 2756 NADP-ce^.-xjent alcohol dehydrogenase 
adhB 2753 atcoho) 0i--7drogenase 
atdX 4093 aldehyde dehydrogenase 
aidY 3985 aldeh>-de dehydrogenase 
aisD 3709 a-acetoiaeate decarboxylase (acetoin biosyntf-.e- 

sis) 

alsS 3710 a-acetoiaeate synthase (acetoin biosynthesis) 
omyE 327 o-amyiase 

amyX 3063 pulluianase . 
araA 2948 L-arabincse isomerase (L-arabinose utilizauon) 
araB 2946 L-ribulokirase (L-arabinose utilization) 
araD 2945 t-ribulose-5i)hosphate4-epimerase (L-arabinose 

utilization) 
araL 2944 L-arabinose operon 
araM 2943 L-arabinose operon 
bglA 4122 6-phospro-glucostdase 
bgtC 1940 endo-1.4— giucanase (oeCutose degradation) 



ydeR 
yxffA 
ydfJ 
ydfL 
ydfM 
ynfO 
ydgP 
ydgH 
ydgK 
ytiht 
ytihM 
ydbN 
yChO 
yd,F 
ydjD 

ye a 8 
yecA 
yesO 
yesP 
yesQ 
yfhA 
yfhl 
yHB 
yfiC 
yfiG 
yf<L 
yiiM 
yftN 
yfiS 
yfiU 
yfiY 
yfiZ 
yfjQ 
yikE 
yikF 
yfkH 
yfki 



578 
580 
589 
595 
596 
597 
608 
609 
613 
626 
625 
627 
627 



676 
687 
7!2 
761 
762 
763 
921 
926 



900 
905 
906 
907 
913 
916 
920 
920 
872 



/OA 
y*!E 
yfiF 
yfl3 
yfmC 
yimD 
yimt 
yfmF 
yfmM 
yimO 
yfmR 
yinA 
ygaO 

ygai 
ygaM 
ygbA 
yhaO 
yhaU 
yhcA 
yhcC 
yhcH 
yhcJ 
yf.ct 
yhdG 
yhdH 
yheH 
yhel 
yhei 
yhfQ 

y^B 

yhjO 
yhjP 

yilZ 
yjbO 
yjdD 

yj'<B 

yjmB 
yjmG 
y*aB 
ykbA 
ykcA 
ykfO 
yknU 
yknV 
yxnY 
ykoD 
ykoK 
ykpA 

ykrtJ 
ykuC 

ykvW 

yimA 

yinA 

ylcB 

ynaJ 

yncC 

yvcN 

yocR 

yocS 

yodE 

yodF 

yoiA 

yPQE 

yqeW 

W9G 

yq§H 

yVQt 

yogj 

K9< 

VQX 
yqiY 
yviZ 
WV 
yqxt 
yraO 
yrbD 
ytbD 
yicP 
ytcO 
yreO 
ytgA 
yrgs 
ytgC 

ytfiP 
yt!C 

ytD 
yep 



844 
844 
840 
829 
826 
825 
824 
823 
815 
812 
809 
806 
939 
961 
963 



antibiotic resistance protein 
arsenical pump membrane protein 
antibiotic transport-associated protein 
muttid rug-efflux transporter regulator 
canon efflux system 
ABC transporter (binding protein) 
amino acid ABC transporter (permease) 
transporter 

frcyclomycin resistance protein 
cnloramphenicol resistance protein 
ceiiobiose phosphotransferase system enzyme II 
ceitobtose phosphotransferase system enzyme ll 
w ceiiobiose phosphotransferase system enzyme ll 
646 ABC transporter (ATP-binding protein) 
668 H"-symporter 
sugar transporter 

cation efflux system membrane protein 
amino acid permease 
sugar-binding protein 
lactose permease 
lactose permease 
iron(iii) dicitrate transport permease 
antibiotic resistance protein 
ABC transporter (ATP-binding protein) 
ABC transporter (ATP-binding protein) 
meta bolite transport protein 
ABC transporter (ATP-binding protein) 
ABC transporter (ATP-binding protein) 
ABC transporter (ATP-binding protein) 
multidrug resistance protein 
muitidrug-efflux transporter 
iron(lll) dicitrate transport permease 
iron(HI) dicitrate transport permease 
divalent cation transport protein 
H/Ca*" exchanger 
muitidrug-efflux transporter 
transporter 

multidrug resistance protein 
ammoacid carrier protein 
anion-binding protein 
phosphotransferase system enzyme II 
2-oxogiutarate/malate trans locator 
fernchrome ABC transporter (binding protein) 
ferrichrome ABC transporter (permease) 
fernchrome ABC transporter (permease) 
lerricnrome ABC transporter (ATP-binding protein) 
ABC transporter (ATP-binding protein) 
muitidrug-efflux transporter 
A9C transporter (ATP-binding protein) 
metabolite transporter 
ABC transporter (ATP-binding protein) 
ni-rate ABC transporter (binding protein) 
— ABC transporter (permease) 
962 ABC transporter (binding lipoprotein) 
1062 ABC transporter (ATP-binding protein) 
1060 Na'/H'antiponer 
977 multidrug resistance protein 

98 1 glycine betaine/i-pro!ine transport 

982 ABC transporter (ATP-binding protein) 
984 ABC transporter (binding lipoprotein) • 
986 socium-giutamate symoorter 

1023 amino acid transporter 

1024 soaiunvdependent transporter 
1047 ABC transporter (ATP-binding protein) 
1045 ABC transporter (ATP-binding protein) 
1044 NaVH-antiporter 
1107 iron(lti}dicitrate-binding protein 
1120 metabolite permease 
1 133 muitidrug-efflux transporter 
1 133 transporter binding protein 
H77 multidrug resistance protein 
1194 multidrug resistance protein 
1240 Ne7H - antiporter 

1272 fruciose phosphotransferase system enzyme II 
1296 ammo actd ABC transporter (ATP-binding protein) 
1301 Na\galactoside symoorter 
1307 hex jronate transporter 
1350 low-affinity inorganic phosphate transporter 

1352 amino acid permease 

1353 ABC transporter (binding protein) 
1368 oligooeptide ABC transporter (permease) 
1499 ABC transporter(ATP-binding protein) 
1501 ABC transporter (ATP-binding protein) 
1505 ABC transporter (ATP-binding protein) 
1390 cation ABC transporter (ATP-binding protein) 
1395 Mg ; * transporter 
1 512 ABC transporter (ATP-binding protein) 
1415 Na -transporting ATP synthase 
1476 macroiide-efflux protein 
1451 heavy metal-transporting ATPase 
»606 ASC transporter (ATP-binding protein) " '^~~ C *" 
1630 anion permease 
1637 calcium-transporting ATPase 
1387 H'-syrroorter 
1896 metabolite transport protein 
2098 permease 

2iC6 sodium-dependent transporter 
21C6 sooium-dependent transporter 

2129 aromatic metabolite transporter 

2130 proline permease 
2125 gluconate permease 
2337 phospnotransferase system enzyme II 
2620 NaVP, coiransporter 
2581 phosphate ABC transporter (binding protein) 
2580 phosohate ABC transporter (permease) 
2579 phosDhateABCtransporter(permease) 
2578 phosphate ABC transporter (ATP-binding protein) 
2577 phosphate ABC transporter (ATP-binding protein) 
2515 lipoprotein 

2492 aminoacidABCtransporter(bindingprotein) 
2491 amino acid ABC transporter (permease) 
2491 amino acid ABC transporter (ATP-binding protein) 
2466 multidrug resistance protein 
2453 NaVH'antiporter 
2745 citrate transporter 

2841 sodium /proton-dependent alanine carrier protein 
2968 antibiotic resistance protein 
3087 ABC transporter (permease) 
3086 lipoprotein 
3082 sugar transport protein 
3145 ABC transporter (membrane protein) 
3144 ABC transporter (ATP-binding protein) 
3143 ABC transporter (membrane protein) 
3071 ABC transporter (ATP-binding protein) 

3132 anion transport ASC transporter (ATP-binding pro- 
tein) 

3133 ABC transporter (permease) 
3065 ABC transporter (permease) 



ytmj 
ymK 
ytml 
ytmM 
yinA 
ytrB 
ytrE 
ytsC 
ytsD 
yvB 
yubD 
yvbG 
yufN 
yvfO 
yvfR 
yufU 
yufV 
yvgO 
yunj 
yunK 
yvrj 

yjrM 
yurN 
yurO 
yvrY 
yvsC 
yusP 
yusV 
yutK 
yuxj 
yvaE 
yvbW 
yvcC 
y*cR 
yvcS 
yvtid 
yvcG 
yvdH 
yvdl 
yveA 
yvfH 
yvfK 
yvfL 
yviM 
yvfR 
yvgK 

yvgL 

yvgM 
yvgW 
yvgX 
yxgY 
y/<A 
yvmA 

yvay 

yvrA 
>vrS 
yvrC 
yvrO 
yvsH 
ywbA 
y\\bF 
ywcA 
y\\ti 
ywfA 
ywiF 
ywnQ 
y\\}A 
y.wA 
yVfOD 
ywof 
ywoG 
y*pC 

ywrA 
yv.rB 
ywrK 
ywrG 
yxaM 
yxcC 
yxdL 
yxdM 
yxeB 
yxeM 
yxeN 
yxeO 
yxeR 
yxtO 
yxjA 
yxkj 



yxlH 
yyaJ 
yycF 
yybj 
yybL 
yytO 
yycB 
yyot 
yyzE 

1.3 

cheA 

CIS 
comP 

degS 



kirA 
kirtf 
kinC 

tyXS 
phoR 
resE 



ytxJK 
ycbA 
yctM 
yccG 



3007 ammo add ABC transporter (binding protein) • 
3006 amino acid ABC transrxjrter (binding protein) 
3006 ammo add ABC transporter (permease) 
3005 amino acid ABC transporter (permease) 
3125 proline permease 
3118 ABC transporter (ATP-binding protein) 
3115 ABC transporter (ATP-binding protein) 
3111 ABC transporter (ATP-binding protein) 
3110 ABC transporter (permease) 
3108 multidrug resistance protein 
3192 muttidrug resistance protein 
3188 Na'-transporting ATP synthase 

3239 ABCtransporter(Iipoprotein) 

3240 ABC transporter (ATP-binding protein) 
3244 organic acid transport protein 

3248 Na7H'antiponer 

3249 NaTH' antiporter 
3218 potassium channel protein 

3330 purine permease 

3331 purine permease 

3345 multiple sugar ABC transporter (ATP-binding pro- 
tein) 

3348 sugar permease 

3349 sugar permease 

3350 multiple sugar-binding protein 
3360 ABC transporter (ATP-binding protein) 
3363 ABC transporter (ATP-binding protein) 
3374 muitidrug-efflux transporter 
3379 ironflll) dicitrate transport permease 
3307 NaVnucleoside cotransporter 
3232 muitidrug-efflux transporter 
3448 muitidrug-efflux transporter 
3490 amino acid permease 
3579 ABC transporter (ATP-binding protein) 
3565 ABC transporter (ATP-binding protein) 
3565 ABC transporter (permease) 
3561 transporter 

3555 maltose/maltodextrin-binding protein 
3554 martodextrin transport system permease 
3552 maltodextrin transport system permease 
3538 permease 
3510 Hactate permease 
3508 maltose/maHodextrin-binding protein 

maltodextrin transport system permease 
3505 maltodextrin transport system permease 
3498 A8C transporter (ATP-binding protein) 
3424 rrwrybdenurrvbinding protein 

3424 motybdate-binding protein 

3425 molybdenum transport permease 
3440 heavymetal-transportingATPase 
3443 heavymetal-transportingATPase 
3443 mercuric transport protein 
3618 muitidrug-efflux transporter 
3605 transporter 
3399 macroiide-efflux protein 

3402 iron transport system 

3403 iron permease 
3403 iron-binding protein 

^ ! oiC° acid Aec transporter (ATP-binding protein) 
3420 ABC transporter (amino acid permease) 
3938 phosphotransferasesysternenzymelt 
3933 sugar permease 
3923 Na •■dependent symport 
3904 nitrite transporter 
3874 chloramphenicol resistance 
3869 efflux protein 

3837 ABC transporter (ATP-binding protein) 
3821 ABC transporter (ATP-binding protein) 
3758 bacteriocin transport permease 
3754 transporter 
3753 permease 

3749 antibiotic res stance protein 
3743 large conductance mechanosensitive channel 
protein 

3721 chromate transoort protein 
3720 chromate transport protein 
3712 arsenical pump membrane protein 
3693 metabolite transport protein 
4100 antibiotic resistance protein 
4087 metaboiitetransponprotein 
4070 ABC transporter (ATP-binding protein) 
4069 ABC transporter (permease) 
4066 ABC transporter (binding protein) 
4059 amino acid ABC transporter (binding protein) 
4058 aminoacidABCtransporter(permease) 
4058 amino acid ABC transporter (ATP-binding protein) 
4054 ethanotamine transoorter 
4009 Mg/Vcitrate complex transporter 
4005 pyrimidine nucleoside transport 
3979 metabdite-sodrum symport 
, 3970 purine-cytosine permease 

3968 ABC transporter (ATP-binding protein) 
3966 muitidrug-efflux transporter 
4194 transporter 
4180 antibiotic resistance protein 
4175 ABC transporter (ATP-binding protein) 
4174 ABC transporter (permease) 
4169 ABCtransporter(permease) 
4159 ABCtransporter(permease) 
4125 ABC transporter (ATP-binding protein) 
4122 phosphotransferase systeme enzyme 11 

SENSORS (SIGNAL TRANSDUCTION) 3a 

1712 two-component sensor histidine kinase 

(CheB/CheY] chemotactic signal modulator 
830 two-component sensor histidine kinase [CitT] 
3255 two-component sensor histidine kinase [ComA] 

involved in early competence 
3646 two-component sensor histidine kinase [Degul 

involved in degradative enzyme and competence 

regulation 

1469 two-component sensor histidine kinase [SpoOF] 

involved in the initiation of sporutation 
3229 two-compor.ent sensor histidine kinase [SpcOR 

involved in tne initiation of sporutation 
1518 two-component sensor histidine kinase [SpoOAI 

involved m the initiation of sporulation (phospho- 

relayindependent) 
2957 two-component sensor histidine kinase [LytTl 

involved in tne rate of autolysis 
2977 two-component sensor histidine kinase [PhoP] 

involved in phosphate regulation 
2416 two-component sensor histidine kinase (ResDJ 

involved tn aerobic and anaerobic respiration 
222 two-component sensor histidine kinase (YbdJ] 
266 two-component sensor histidine kinase [YcbBl 
279 two-eomporen sensor histidine kinase (YcbLI 
295 two-compcr<?iE sensor histidine kinase (YccHJ 



yctK 
ydbF 
ydfH 
yesM 

yfU 
yhcY 

ykoH 

ykfO 

yk\D 

yocF 

yrkO 

ytrP 

ytsB 

yvfL 

yvcQ 

yvfT 

yvqB 

yvqE 

yvrG 

ywpD 

yxdK 

yxjM 

yycG 

1.4 



atpA 
atyB 
atpC 
atpD 
atpE 
atpF 
atpG 
axpH 
atpi 
cccA 
cccB 
ccdA 
ctaA 



*27 rwo<omponent sensor histidine fcnase fYclll 

<97 r^o^ponent sensor histidine kinase pf dbG] 

587 rvvo-compooent sensor histxJine kinase K'dfll 

758 ^component sensor histidine k,nase ksN] 

903 twc-component sensor histidine kinase rYfiKi 

008 two^omponent sensor histidine kinase ^hcZl 
^nsduction pteiotropic re^tcW 

1392 rwo-componer,t sensor histidine kinase fYkoGl 
M19 Nvc-component sensor histidine kinase 
H32 two-component sensor histidine kinase 
2090 ftvo-component sensor histidine kinase (YccG) 
2704 twcHxmponent sensor histidine kinase ftUPI 
3035 rwo^omponent sensor histidine kinase 
3112 tvvo<omponent sensor histidine kinase rYtsAI 
3236 two^ompooent sensor histidine kinase {YuiM} 
3566 two-component sensor histidine kinase lYvcP! 
3497 twcKomponent sensor histidine kinase [YvfUl 
3385 two-component sensor histidine kinase rYvaA] 
3395 two^omponent sensor histidine kinase ^vcC 
3407 two-component sensor histidine kinase [YvrH] 
3741 two-component sensor histidine kinase 
4071 two-component sensor histidine kinase [YxdJl 
3992 two-component sensor histidine kinase (YxiLl 
41 53 two-component sensor histidine kinase [YycF] 

iSKS^ ^ENERGETICS (ELECTRON 
TRANSPORT CHAIN AND ATP 
SYNTHASE).. 



ctaB 
ctaC 
ciaD 
ctaE 
ctaF 
cydA 
cydB 
etfA 
erfB 
fer 
hmp 
narG 
narH 
narl 
narj 
ndhF 
qcrA 

qcrB 

qcrC 

qoxA 
qoxB 
qoxC 
qoxD 
resA 

resB 

resC 

tip 
tfxA 
trxB 
ycgT 
ycnD 
ydbP 
ydeO 
ydfO 

ydgi 

yfkO 
yfmj 
yjdK 

yjto 

ykuN 
ykuP 
ykuU 
ykW 
yneN 
yojN 
yoli 
yosR 
ypdA 
yq>G 

yqJM 
yrkL 
ythA 

ytpP 

ytrC 
ytrO 
yufD 
yvfT 
yvmB 
yumC 
yvsE 
yvU 
yvaB 
ywcG 
yyvhN 
ywrO 

1.5 

cheC 
cheD 
cheR 
cheV 
cheW 



3784 ATP synthase (subunit a) 
3787 ATP synthase (subunit a) 

3781 ATP synthase (subunit e) 

3782 ATP synthase (subunit 0) 
3786 ATP synthase (subunit c) 

3786 ATP synthase (subunit b) 

3783 ATP synthase (subunit H 

3785 ATP synthase (subunit 5) 

3787 ATP synthase (subunit i) 
2599 cytochrome Cjm 
3625 cytochrome 

1922 required for a late step of cytochrome csynthesis 
s^s) 00 ^ 0016 0833 o^ase (required for biosynthe- 

1559 cytochrome caa, oxidase (assembly factor) 

iboO cytochrome caa, oxidase (subunit ll) 

1561 cytochrome caa, oxidase (subunit I) 

i5o3 cytochrome caaj oxidase (subunit 111) 

1563 cytochrome caa, oxidase (subunit (V) 

3978 cytochrome bd ubiqutnol oxidase (subunit I) 

3977 cytochrome bdubiquinol oxidase (subunit II) 

29 15 electron transfer flavoprotein (a subunit) 

2916 electron transfer flavoprotein (B subunit) 
2409 ferredoxin 
1372 flavohemoglobin 
3829 nitrate reductase (a subunit) 
3825 nitrate reductase (B subunit) 

3823 nitrate reductase(ysubunit) 

3824 nitrate reductase (protein t) 
205 NADH dehydrogenase (subunit 5) 
2364 menaquinohcytochrome coxidoreduaasef iron- 
sulphur subunit) 

2364 menaquinol:cytochromecoxidoreductase 

(cytochrome b subunit) 
2363 menaquinohcytochrorrocoxidoreductase 

(cytochrome b/c subunit) 
3917 cytochrome aaj quinol oxidase (subunit ll) 
3916 cytochrome eaj quinol oxidase (subunit I) 
3914 cytochrome aa^ quinol oxidase (subunit III) 
3913 cytochrome aa-, quinol oxidase (subunit IV) 

2421 essentialproteinsimilartocytocruomecbksene- 
sis protein 

2420 essential protein similar to cytochrome cbiooene- 

sis protein ^ 
2418 essential protein similar to cytochrome cbiooene- 

sis protein 
1930 thioredoxin-like protein 
2912 thioredoxin 
3573 thioredoxin reductase 
352 thioredoxin reductase 
439 NADPH-fiavin oxidoreductase 
cno thioredoxin 

NAD(P)H oxidoreductase 
thioredoxin 

NADH dehydrogenase 
NAD(P)H-flavin oxidoreductase 
u .o quinone oxidoreductase 
1280 cytochromecoxidaseassembryfactor 
1299 NADH dehydrogenase 
1486 flavodoxin 
1488 sulfite reductase 
1492 2-cys peroxiredoxin 
1450 thioredoxin 

1929 thioi:disulfide interchange protein 
21 14 nitnc-oxide reductase 
2267 thioredoxin 
2159 thioredoxin 
2*01 thioredoxin reductase 
2516 NADH-dependent flavin oxidoreductase 
2475 NADH-dependent flavin oxidoreductase 
2708 NAD(P)H oxidoreductase 
3139 cytochrome doxidase subunit 
3054 thioredoxin Ht 
3117 cytochrome c oxidase subunit 
3116 cytochrome c oxidase subunit 
2£S tfADH dehydrogenase (ubiquinone) 
3246 NADH dehydrogenase 

3300 NADH dehydrogenase 

3301 thioredoxin reductase 
3364 thioredoxin 
3308 NADH dehydrogenase 
™? |^P) H der^rogenase (quinone) 
3911 NADPH-ftavin oxidoreductase 
3840 ubiquinol-cytochrome c reductase 
3708 NADfpjH oxidoreductase 



508 
576 



613 
854 



egc 



MOBILITY AND CHEMCTAX1S 

1715 inhrbrtionofOwR-mediatedmem^tionof 

me:h ^ acc ept>ngchemotaxis proteins 
1715 required for methytation of methyt-accepting 
^ chemotaxis proteins byCheR 
Z380 methv^atxepting chemotaxis proteins methyt- 
transferase 

1473 modulation of CheA activity in response to atL-ae 
tants(CheWandCheYsimilarrJaSains) 
m ^ Llatton of acsvity in response to a^rac- 

1591 flagellar basaf-oody rod protein 
1691 fiage-!arbasaK>ody red protein 



55 



Table 1 . Functional classification of the Bacillus su6f///sprotein-coding genes. 



CFl L ENVELOP E ANO CELLULAR 



cWC 



dacA 



dacB 

dacF 
ddlA 
dKA 
dtiB 
(SitC 
dUD 



ditE 
gcaD 



ggaA 

ggaB 

gtaB 
lytB 

tytC 

lytD 

tyiE 
mbi 
mraY 

rrveB 

mreBH 

mreC 

mreD 

murA 

murB 

murC 

murD 

murE 

murF 

mufG 



murZ 

pbp 

pbpA 

pbpB 

pbpC 

pbpO 

pbpE 

pbpF 

pbpX 

ponA 

racE 

spoVD 

tagA 

tagB 

tagC 

tagD 

lagE 

tagF 



tagG 
tagH 
tagO 
VjaA 
fuafl 
tuaC 
■ tuaD 

tuaE 
tuaf 
tuaG 
tuaH 
wapA 



CELL WALL_ 



_93 



wprA 
>xfyA 



2665 /V-aceMmuranwyK-elaninearTiidsse (minor 
autolysin) 

1873 WacetytrruirariwyK-alaninoarnidaselsporula- 
tion mother ceO wall) 

1 57 W*«ty1rnuramoyH-8tantne amidase (germina- 
tion) 

282 cell wail hydrolase (sporulation) 
18 penicillin-binding protein 5 (r>alarryk>-alan'ine 
carboxypeptidase) (peptidogiycan biosynthe- 
sis) 

2424 penicillin-binding protein 5" (o-atanyk>a!anine 
carboxypeptidase) (pepiidogtycan biosvnthe- 
sis) (spore cortex) 

2445 penicillin-binding protein (D-alanyt-o-alanine car- 
boxypeptidase) (peptidogtycan btosynthesis) 

503 D-alanyVo-atanine ligase A (peptidogtycan 
biosyntnesis} 

3961 o-aianyk^alanine earner protein ligase (lipotei- 
choic acid biosynthesis) 

3953 D-alanine transfer from Dcp to undecaprenoJ- 
phosphate (lipoteichoic acid biosynthesis} 

3964 ©-alanine carrier protein (lipoteichoic acid 
biosynthesis) 

3954 o-alanine transfer from undecaprenol-phos- 
phate to the poWglycerophosphate) chain 
(lipoteichoic acid biosynthesis) 

3955 involved in !ipotei(rfio»cacid biosynthesis 

56 UDP-W-acetylgfucc^mine oyrophosphorylase 
(peptidogtycan and lipopolysaccharide biosyn- 
thesis) 

3670 galactosamine^ntainingmirwteichoicactd 
biosynthesis 

3669 galactosamine-containing minor teichoic acid 
biosynthesis 

3665 UTP-glucose- 1 -phosphate uridytyttransferase 
3662 modifier protein ol major autolysin LytC 
(CWBP76) 

3660 r^acetylmuramoyk--aianine amtdase (major 

autolysin) (CWBP49) 
3687 W-acetylgtucosaminidase (major autolysin) 

(CWBP90) 
1018 cell wall lytic activity (CWBP33) 
3747 MreB-like protein 

1587 phosprto-fVacetytmurarnoyt-oentapeptide 
transferase (peptidogiycan biosynthesis) 

2861 cell-shape determining protein 

1517 cell-shape determining protein 

2860 cell-shape determining protein 

2859 cell-shape determining protein 

3778 UDP-W-acetytglucosamine 1-cartxwyvinyttrans- 
ferase (peptidogtycan biosynthesis) 

1592 UDP-^acety1erK)lpyTuvc<vlglucosamine reduc- 
tase (peptidogiycan biosynthesis) 

3049 UDP-Af acetyimuramate-elanine ligase (peptido- 
giycan biosynthesis) 

1588 UDP-W-acetylmuramcylalaninehO-glutamate lig- 
ase (peptidogtycan biosynthesis) 

1586 UDP-A^acetylrmjramoytananine-o-gluta- 

mate-2.6-diaminopimelate ligase (peptidogtycan 
biosynthesis) 

509 UDP-rV-acetylmuramoylalanyt- 

r>glutarnyt-2.6-diaminopimelate-r>alany1- 
D-alanyl ligase (peptidogtycan biosynthesis) 

1591 UDP-Z^acetylglucosamine-W-acetytmurarrry^ 
(pentapeptide)pyrophosphorykjndecaprenoi 
rV-acetylglucosamine transferase (peptidogiy- 
can biosynthesis) 

3806 UDP- rV-acetylglucosamine 1 -rartoxyvinyttrans- 
ferase (peptidogtycan biosynthesis) 

1999 penicillin-binding protein (peptidogiycan biosyn- 
thesis) 

2583 penicillin-binding protein 2A (peptidogiycan 

biosynthesis) (spore outgrowth) 
1 58 1 penicillin-binding protein 2B (peptidogiycan 

biosynthesis) (cell-division septum) 
463 penicillin-binding protein 3 (peptidogtycan 

biosynthesis) 
3233 penicitlin-binding protein 4 (peptidogiycan 

biosynthesis) 
3535 penicillin-binding protein 4* (peptidogiycan 

biosynthesis) (spore cortex) 
1033 penicillin-binding protein 1 A (peptidogiycan 

biosynthesis) (germination} 
1765 penicillin-binding protein (peptidogiycan biosyn- 
thesis) 

2341 penicillin-binding proteins 1 A/ 1 8 (peptidogiy- 
can biosynthesis) 

2903 glutamate racemose (peptidogiycan biosynthe- 
sis) 

1584 penicillin-binding protein (peptidogiycan biosyn- 
thesis) (spore cortex) 

3680 involved in poiygtycerol phosphate teicnoic acid 
biosynthesis 

368 1 involved in potyglycerol phosphate teicnoic acid 
biosynthesis 

3682 involved in porygtycerol phosphate teicnoic acid 
biosynthesis 

3680 glyc«rol-3-phospratecyticVMtfa ri s f erase (tei- 
cnoic acid biosynthesis) 

3679 UDP-glucose:polygtycerol phosphate glucosyl- 
transferase (teichoic add biosynthesis) 

3877 CEP-gfycerol:polygrycerol phosphate gfycero- 
phosphotransierase (teichotc acid biosynthe- 
sis) 

3675 teicnoic acid translocation (permease) 

3674 teichoic ectd translocation (ATP-binding protein) 

3649 teichoic acid linkage unit synthesis 
3658 biosynthesis of teichuronic acid 
3657 biosvnthestsofteichuronicactd 
3656 biosynthesis of teichuronic acid 

3655 bic^tf^ts olteichuronic acid (UDP-g!ucose 

6-denydrogenase) 
3653 biosynthesis of teichuronic acid 
3652 biosynthesis of teichuronic acid 
3651 biosynthesis of teichuronic acid 

3650 biosynthesis ol teichuronic acid 

4029 ceD watl-essociated protein precursor , 

(CWBP200.1O5.62) 
1153 cell waft-associated protein precursor (CWBP23 

and serine protease CWBP52) 
1347 ^eretytmuranxrvK-elanine amidase (PBSX 



jtlyB 

yfnG 

yhdD 

ykuA 

ytbt 

ymaG 

yng8 

yocH 

yodJ 

yojl 

yomC 

ypdO 

ypfP 

ypjH 

yqeE 

wry 

yqil 
yrhL 
yrrfl 
yrW 
yxcC 

ytkC 
ytxN 

yubE 
yvcE 
ywhE 
ywrO 

12 

aapA 

atsT 

amyC 

amyD 

appA 

appB 
appC 
appD 

appF 

araE 
araN 
araP 



azJC 
azID 
bgtP 

bit 

bmr 

braB 

brnO 

citM 

csbX 
cydC 

cydD 

czcD 
dppA 
dppB 

dppC 

dppD 

dppE 

ebrA 
ebrB 
ecsA 
ecsB 
expZ 
feuA 
feuB 
feuC 
fhuB 
fhuC 

fhuD 

fhuG 
fnjA 

gabP 
gtnH 
glnM 
gfnP 
glnO 

gipF 
glpT 
gitP 
gitT 
gtvC 

gntP 

NsP 

hutM 

k>lF 

kdgT 

lap 
levD 

levE 

levF 

ievG 

ticA 

rtcB 

fcC 



peonage-mediated lysis) 
1317 Af acetyirnuramoyH-aiantne amidase (PBSX 

prophage-mediated lysis) 
799 COP-gtucose 4.6-dehydratase 
1013 cell waft-btnding protetn 
1467 penicillin-binding protein 
1569 tipopofvsacchantie core biosynthesis 
1865 cell was protein 

1946 UTP-gtucose- 1 -phosphate uridytytuansferase 

2093 cell wall-binding protein 

2135 r>atanyK>elanine carboxypeptidase 

21 16 cell waU-binding protetn 

2263 /^ecerylmurarriyH'alariine amidase 

2310 ceil wan enzyme 

2306 cell wall synthesis 

2357 lipopolysaccharide biosynthesis-related protein 

2649 rV^BcetylmuramovR-atanine amidase 

2588 peptidogtycan acetylation 

2515 aceiytmiirarnovH-alantne amtdase 

2771 acyttransferase 

2791 penicillin-binding protein 

2818 A^acetylmurarnoyK-alanine amtdase 

3157 lipopolysaccharide /VacetylgKxx^fninyftrans- 

ferase 
3135 autoiytic amidase 

3161 lipopolysaccharide rV-acetyfglucosaminyttrans- 
ferase 

3191 rV-acetytmuramoyK^alanine amidase 
3575 cell wall-binding protein 
3849 penicillin-binding protein . 
3697 murein hydrolase 

TRANSPORT/BINDING PROTEINS AND 
LIPOPROTEINS 381 



opuAB 322 
opuAC 323 



opufiS 
opu3C 



opuBD 
opuCA 



oouCB 
opuCC 



opuCD 

opjD 
ocuE 
pbvX 
paG 

pt& 

p>rP 
rbs.4 
rbsB 
rtsC 
rbsD 
rocC 



sacP 



2766 amino add permease 
1938 amino acid carrier protein 
3099 maltose transport protein u 
3098 sugar transport 

1213 oligopeptide ABC transporter (oligopeptide; ^> 

binding protein) ' \<:ftWyBA) 

1215 oligopeptide ABC transporter (permease) 

1216 oligopeptide ABC transporter (permease) 

121 1 oligopeptide ABC transporter (ATP-binding pro- 
tein) 

1212 oligopeptide ABC transporter (ATP-binding pro- 
tein) 

3485 L-arabinose transport (permease) 
2942 L-arabinosetransport(sugar-binding protein) 
2941 L-arabinose transport (integral membrane pro- 
tein) 

2940 L-arabinose transport (integral membrane pro- 
tein} 

2729 branched-chain amino acid transport 

2728 branched-chain amino acid transport 

4034 phosphotransferase system (PTS)B-glucoside- 
specific enzyme I1ABC component 

2716 muftidrug-efftux transporter 

2494 multidrug-eftlux transporter 

3027 branched-chain amino acid transporter 

2728 branched-chain amino acid transporter 

834 secondarytransponerof the Mg^Vcitrate com- 
plex 

2838 a-tcetogtutarate permease 
3976 ABC transporter required for expression ol 

cytochrome bd (ATP-binding protein) 
3974 ABC vansporter required for expression of 

cytochrome bd (ATP-binding protein) 
2724 cation-efflux system membrane protein 

1360 dipeptide ABC transporter (sporulation) 

1361 dipeptide ABC transporter (permease} (sporula- 
tion) 

1362 dipeptide ABC transporter (permease) (sporula- 
tion) 

1 363 dipeptide ABC transporter (ATP-binding protein) 
(sporulation) 

1364 dipeptide ABC transporter (dipeptide-binding 
protein) (sporulation) 

1865 multidrug resistance protein 
1864 multidrug resistance protein 

1077 ABC transporter (ATP-binding protein) 

1078 ABCtransDoner(membraneprotein) 
606 ATP-binding transport protein 
183 iron-uptake system (binding protein) 
182 iron-uptake system (integral membrane protein) 
181 iron-uptake system (integral membrane protein). 

3417 ferrichrome ABC transporter (permease) 

3415 ferrichrome ABC transporter (ATP-binding pro- 
tein) 

3418 ferrichrome ABC transporter (ferrichrome-btnd- 
ing protein) 

3416 ferrichrome ABC transporter (permease) 
1 509 phosphotransferase system (PTS) fructose- 
specific enzyme IIBC component 

686 f-aminobutyrate permease 

2802 giutamine ABC transporter (glutamine-binding) 

2803 giutamine ABC transporter (membrane protein) 

2804 giutamine ABC transporter (membrane protein) 
2802 giutamine ABC transporter (ATP-binding pro- 
tein) 

1002 glycerol uptake facilitator 

235 giycerol-^phospnate permease 

255 H'/glutamate symport protein 

1097 HVNa'-giutamate symport protein 

892 phosphotransferase system (PTS) arbutin-like 
enzyme IIBC component 

41 15 gluconate permease (gluconate utilization} 

3004 histidine transport protetn (ATP-binding protein) 

4046 histidine permease 

4077 inositol transport protein 

2322 2-keto-3<Jeoxygluconate permease (pectin uti- 
lization) 

330 L-lactate permease 

2762 phosphotransferase system (PTS) fructose- 
specific enzyme IIA component 

2762 phosphotransferase system (PTS) fructose- 
specific enzyme II B component 

2761 phosphotransferase system (PTS) fructose- 
specific enzyme IIC component 

2760 phosphotransferase system (PTS) fructose- 
specific enzyme HD component 

3959 phosphotransferase system (PTS) Cchenan- 
specific enzyme IIA component 

3961 phosphotransferase system (PTS) Itohenan- 
specrftc enzyme HB component 

3960 phosphotransferase system (PTS) Ochenan- 



speciftc enzyme IIC component 
bnrS 290 lincomycin-resistance prote:n 
IplA 779 lipoprotein 
IpiB 781 transmembrane lipoprotein 
fc.C 782 transmembrane lipoprotein 
mdr 334 murtrdrug-efflux transooner (puromycirL ner- 

floxacin.tosufloxacin) 
msmE 3097 multiple sugar-binding protein 
msmX 3984 multiple sugar-binding transport ATP-binding 
protein 

rrslA 449 phosphotransferase system (PTS) mannitol- 
specific enzyme IIABC component 

narX 3833 nitrite extrusion protein 

nasA 363 nitrate transporter 

natA 296 Na' ABC transporter (extrusion) (ATP-binding 
protein) 

naiS 297 Na' ABC transporter (extrusion) (membrane 
protein) 

nrgA 3756 ammonium transporter 
nupC 4050 pyrimidine-nucleoside transport protein 
oppA 1219 oligopeptide ABC transporter (binding protein) 
(initiation of sporulation. competence develop- 
ment) 

oppB 1221 oligopeptide ABC transporter (permease) (initia- 
tion of sporulation, competence development} 

oppC 1222 oligopeptide ABC transporter (permease) (initia- 
tion of sporulation, competence development) 

oppD 1223 oligopeptide ABC transporter (ATP-binding pro- 
tein) (initiation of sporulation, competence 
development) 

oppf 1224 oligopeptide ABC transporter (ATP-binding pro- 
tein) (initiation of sporulation, competence 
development) 

opuAA 321 glycine betaine ABC transporter (ATP-binding 
protein} (osmoproiection) 
glycine betaine ABC transporter (permease) 
(osmoproiection) 

glycine betaine ABC transporter (glycine 
betaine-binding protein) (osmoprotection) 
3462 choline ABC transporter (ATP-binding protein} 

(osmoprotection) 
3461 choline ABC transporter (membrane protein) 

(osmoprotection) 
3460 choline ABC transporter (choiine-binding pro- 
tein) (osmoprotection) 
3460 choline ABC transporter (membrane protetn) 

(osmoprotection} 
3470 glycine betaine/camitine/chotine ABC trans- 
porter (ATP-binding protein) (osmoprotection) 
3469 glycine betaine/carniiine/choline ABC trans- 
porter (membrane protein) (osmoprotection) 
3468 glycine betaine/carnitine /choline ABC trans- 
porter (osmoprotectant-binding protein) (osmo- 
protection) 

3467 glycine cetatne/carnitine /choline ABC trans- 
porter (membrane protein) (osmoprotection) 
3076 glycine betaine transporter (osmoprotection) 
728 proline transporter (osmoprotection) 
2319 xanthine permease 
K57 phosphotransferase system(PTS) glucose 

-specific enzyme II ABC component 
1 459 phosphotransferase system (PTS) enzyme t 
(general energy coupling protein of the PTS) 
1618 uracil permease (pyrimidine biosynthesis) 

3703 ribose ABC transporter (ATP-binding protein) 
3705 noose ABC transporter (ribose-binding protein) 

3704 ribose ABC transporter (permease) 
3702 ribose ABC transporter (membrane protein) 
3876 amino acid permease (arginine and ornithine 

utilization) 

41 43 amino acid permease (arginine and ornithine 
utilization) 

3904 phosphotransferase system (PTS) sucrose- 
specific enzyme IIBC component 
1 533 small peptidogtycan-associated lipoprotein 
2269 subtancin 168 lantibiotic transporter 
4 1 88 tetracycline resistance protein 
850 phosphotransferase system (PTS) trehalose- 

specific enzyme IIBC component 
2723 potassium uptake 
65 amino acid transporter 

ABC transporter (ATP-binding protein) 
sucrose phosphotransferase enzyme II 
chloramphenicol resistance protein 
ABC transporter (binding protein) 
ABC transporter (permease) 
amino acid transooner 
phosphotransferase system enzyme II 
histidine permease 

sodium/proton-dependent alanine transporter 
ABC transporter (ATP-binding protein) 
amino acid permease 
giuca'ate transporter 
efflux system 

ABC transporter (ATP-binding protein) 
ion channel 

ABC transporter (ATP-binding protein) 
transporter 

muftkJ rug-efflux transporter 
amino acid transporter 
proline permease 

amino acid ABC transporter (permease) 
amino acid ABC transporter (binding protetn) 
giutamine ABC transporter (ATP-binding pro- 
tein) 

giutamine ABC transporter (permease) 
giutamine ABC transporter (glutamine-binding 
protein) 

di-tripeptide ABC transporter (membrane pro- 
tein) 

ABC transporter (permease) 
transporter 

ferrichrome ABC transporter (permease) 
ferrichrome ABC transporter (permease) 
ferrichrome ABC transporter (ATP-binding pro- 
tein) 

ferrichrome ABC transporter (binding protein) 
multidrug resistance protetn 
copper export protein 
branched chain amino adds transporter 
ABC transporter (binding protein) 
C4-dicarboxylate binding protetn 
C4-dicarboxytate transport protein 
ABC transporter (ATP-binding protein) 
metabolite transport protetn 



sunT 
letB 
rreP 

trJW 
yabM 
ybaE 
ybbF 
ybcL 
ybdA 
ybdB 
ybeC 
>tofS 

ybgF 

ybgH 
ybxA 
ybxG 
ycbE 
ycbK 
ycbN 
yCCK 
ycdl 
ycei 
yceJ 
ycgH 
ycgO 
yckA 
ycKB 
ycki 

ycO 
yckK 

yclF 

yctH 
yen 
ydN 
yo'O 
ydP 

ydO 
ycnB 
yenj 
ycsC 
ydbA 
ydbE 
ydbH 
ydbJ 
ydeG 



151 
191 
212 
217 
218 
231 
257 
262 
264 
150 
227 
270 
277 
280 
298 
309 
317 
320 
337 
347 
368 
368 
410 

410 
411 

417 

424 
426 
432 
433 
434 

435 
437 
448 
457 
493 
497 
500 
502 
566 
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ithe- 



tec 
fnetB 
metC 



esubunft) 

WosvmhX) SUCanyftrans ferase(a4h i 



)335 owsynihesisj /- a .«.«rase (methionine 

3i28 E^^ j; ^^3r thioninesynmase 

■»o assmiiatofvnii™,.^ . 



358 
355 
1185 



y<io 

yqjR 

yrte 

yrhA 
yrhB 
yrhP 
yrpC 
yrrN 
yrrO 
ysnE 
yrfD 
ytkP 
yubc 
yugH 
yurG 

yurH 

yuit 

yurP 
yurff 
yurT 
yusH 
yusM 
yvsx 
yuri 
yuxt 
yvaK 
yWD 
yyB 
ywaA 
ynaD 
weB 
ywfG 
yv*ftF 
ywhG 
ywro 
yxeP 



11.3 



tyrA 
ureA 



-uuuniy — «-cu wfj transfer 

, biosynihJS? a,ea ^ r °Senase(serlne 

3^8 urease Osubunit 

asparaginase 
proline oxidase 

P-M^ 
fe^Mrogenase 



ao-V 
apt 
cdc 
cmk 
ctrA 
tieoD 

era 

drm 

3L.3A 

gvaB 

hipO 
hprT 



™ mpeptidase 
?3f Cysre ' n e synthase 

fTjjp protease 
|® 7 "^ans/erase 

^acytarninoacidracernase 

£f? ^nateamiriffSSerase 

I» o^e^r TOaQ ^^'ase 
*gl opine catabolism 
op<necarabolism 

«54 carboxytes:erase 
*'6 senne aacetyftransferac- 

3£7 aminopepS ^ aCrtaminotran sterase 
55 ^^dehydrogenase , * ^ 

3£uf ???f md ' ne synthase 
agmannase 

,'S' ^ineoea'mirSe" " 83 

2a5i fJ^B kinase 

„ osKlesaiSi.) * < ^^ Me IPurinen octe . 
**' .^^'^se-ptiosphaieairw,.. 

savage; ,,oom wase (punne nucleoside 

b'osynrhesisj cer Vdrogenase (GMP 

1 n, PDura;ehvdrola« 



yabR 
yerA 
yfkN 
yhaM 
yhcR 
yirY 
yjbM 
yjbP 
yfckB 
ytbB 
yfoO 
ymaA 
yncB 
yncF 
yosN 

yosO 

yosP 

yosS 



nin 
nrcE 

nroP 

nucA 
nucB 
Pdp 
pnp 



38 
259 
265 
290 
344 
345 
415 
432 
441 
443 



,8 a e S e t6 ~('-edo^ 
1031 aminoacyiase 
5f S arlate ^^transferase 
^asi^'^'^'en, 

\}S a !f ara Sine synthase 
\™ opme aminotransferase 

243 sarcosme oxidase 



pnpA 
prs 


1739 
53 


purA 


4156 


PurB 
pure 


700 
701 


purO 


710 


purE 


698 | 


purf 


705 p 


purH 


k 

708 p 



238i „ ~ e,pur " 1 «' sarvagel 

subunrj °™" nos l™* reductase (major 

2«o purine nucleoSde?hl P ^ )Sphory!ase 

oside sa^gej 6 ^P^ase (purine nucfe- 



ypfD 

yq:3 

yQ/C 

yrdF 
yrrv 
yumD 
J? ^yunH 
" yunL 
yuri 
y*aC 

11.4 
sccA 

1 accB 

accC 

acdA 
acpA 
CtfsA 

dgkA 

fabD 

fabG 

gipO 

,'cfA 

tipA 
tipB 



ij d eoxypur;n e kinase suh. ^ 

«• ™^ U S a8? ^^«3.erase 

TfJ.. ^•""Ceotidase 
J« DNAexonucfease 

fluanyfateicnase 
|68 nbonudeoprotein 
g rnjaococcal nuclease 

unn) pnosph8 '8Wiuctase(a 8uf> 

3328 aliantoinase 
3332 uncase 
3343 ribonuclease 
3949 GTp -^°phosphoiu-na Se 

sSS^ a ^^f s « ^tin cartoxyl carrier 

„ 1 1 ''Pid biosyntnesj { W nrat * t **se (pho Sp hr> 
diacytgveeroi kinase tohn*^ 
sts) se (Pnosphofipid biosynthe- 

metabolisr 

acyfCoAsynthe^sejfa^g^ 

292 lipase 
9 10 ' 



. 910 lipase 

593 S£%£^<^™ 



nap 

pgsA 

P'sX 
pr.tA 
psd 



sis) ^'^^^sefAMPbios^ 
amide synthetase fouZ ^ e SucC) nocarco3 



1 acid decar- 



i«Y7 y,uiam| nase 

IW S?^" 1 !! Protease 
leas SSf^Proiraw 

|S ^"oacWoxtose 
^ r^Ke^^-'-^noa.e 



purK 

Putt 

purM 

purN 

purO 

purT 

pyrAA 

PyrAB 

pyrB 

PyrC 
pyrD 

PyrDll 

pyrE 

PyrF 

relA 
smbA 
tdk 
tftyA 

thyB 

tmk 
udk 
upp 



699 
702 



= Biosynthesis I * 

(^^S^^oiecarti^se,, 
tte^fe^^^ase 

~ »o»eSr mte ^-'e,ase 2 

IK> , ^Ssr^^'^iU*.™ 

« '-^«e11o»^^ 
^S^r^S^ne 

SOU ^ine ^'^^Wosynihes^ 58 ' 

sa^ge, ^ S ^ ansteras e(Pyrirnidine 



KQA 

nrs/ 
>us£. 

>t/so 

>tjs/r 

yuss 
yvaG 
yvrD 

ywfH 

y*h8 
y*iE 
WE 
ywnE 



iwb c f/ tox V l «:eraseNA 
"« P"ospria;id>1glycerr;-+n< n h_ 

" o°^srr™~(^sp Wi pi 0 

fi 7 ^rtwxyiesterase 

«3 3-oxoac^l. 8M«S er L^^'e^ase 
°S9 ipoe,e-Bwe^ a Te ero,0,e,ns > mm =sa 

mi lonn^hf * a cet yt r ar, S fere Se 

J" gs^r^'* 3 - 

' 247 «^SSS! me,p,olei " synthase 

ateSS^^enase 
IflXrS^^^enese 

37s1 S 0 - ,p ! nsyntne tase 
3752 cardrof^nsynina^ 0 



ybflr 

>C'S 
ydbM 
ydcB 
• yfjR 
yhaR 
ytidO 
yhdW 
yfifB 
yhtt 
ytifL 
ytrs 
yhrr 
yisP 
yjax 
yjar 
yjbw 
yjdA 
ykhA 
ytwC 
ymfi 
yngF 
yngG 
yngi 
yngj 
yocE 

yocj 
yodR 
yodS 
yoxD 
w>D 

MX 
yqiS 

ytjfu 
yqjo 



y*jD 
y*j£ 

11.5 

bkjA 

bioB 
bioD 
bioF 

bioi 

bioW 

dfrA 

dhaS 
dhbA 

dhb8 

dhbC 

dhbE 



dhbF 
foiA 
tolC 



folK 
99t 

gsaS 
hemA 
hemB 

hemC 

hemD 

hemE 

hemH 
hemL 

hemN 

hemX 
hernY 

menB 

menD 

menE 

menF 

moaB 
moaD 
moaE 
mobA 
mobS 
moeA 
moeB 
mtrA 

nadA 
nadB 
nadC 
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narA 
nasF 

nifS 
pabA 

paoB 

pabC 

panB 

panC 

panD 

ribA 

ribB 

ribC 

ribG 

ribH 

ribT 
suf 

thiA 

XNC 

thiD 

INK 

yaat 
ydiA 
ydiG 
yhaV 
yhcB 
yhfU 



3743 ^^^^^ (arTier P'aein)rfehy- 
jOOi 3-oxoa(jipa:eCoAKransfefase 
4001 3-oxoadipateCoA-transferese 

GTOUPS LISM0 !^ E ^ MES ^ DPftOSTHET,C 
3094 edenos>Tmemi(^^ 

aminotransferase (biotin bwsynthesis) 
309 biotin synthetase (biotin biosynthesis) 
3091 de^obow, synthetase (biotin biosynthesis) 

3089 gj 3 ** 1 ' 0 ™ p4 5Wike enzyme (boon biosynthe- 
3094 ^|^ )xynexanoa ' 9 <^ A l^se (bictin biosyrv 
2296 dihydrofolate reductase (gtydne/purine/DNA pre- 

3291 n^ , r^^ lhy ? raxybenzoate ^^ehydroge- 
™ nase(2.3^ihydroxyben2oate biosynthesis) 

3291 ssr^^ 11 ^^ 16 

3289 l^'W^ytenzoate-AMP ligase (enterobactin 

rSe^ 

3287 involved in 2.3-dirydroxybenzoate biosynthesis 
5 < 2j l ' p0,y 9' lJtarnate synthetase (folate oiosynthe- 

2529 metnyienetetrahydrofolatedehydrooenase/ 

^^ fte ^^ rotoate ^°^ ro ^(Purines 

and amino aods biosynthesis) 

^' n ( ^ K ro f^^o^emytpterin pyrophosphok- 

inase (dihydrofolate biosynthesis) 
200* rglutamyttranspeptidase (glutathione metabo- 

lism) 

ISo ^tamaTe-i^emialdehyde aminotransferase 
ISv E a m^r^^^ taSe(wh ^ nb ^^esis) 
287. ^amtnoievuItfTcaad dehydratase (porphyrin 
biosynthesis) 

2876 Ks? b ,,irK>Sen dearT,inase (P° r P h yrin bkssyn- 
2875 uroporphyrinogen ill cosynthase (porphyrin 

biosynthesis) * 
1C86 uroporphyrinogen (It decarboxylase (porphyrin 

b-osyntnesis) 
1037 ferrocheiatase (porphyrin biosynthesis) 
2373 glutamate-i -semiaidehyde 2.1 -aminotransferase 

(porphyrin biosynthesis) 

2630 coproporphyrir»genllloxio^se(r»rahynn 
biosynthesis) 

2877 negath/eeffeaor of the concentration of HemA 
Porphyrinogen IX oxidase (porphyrin biosyn- 

3U9 dihydroxynapmoic acid synthetase 

(menaquinone biosynthesis) 
3151 2-succiny).6^roxy.2.4-cyclohexadiepe-1<ar- 
boxyiate synthase / 2-oxogfutaraie decarboxylase 
(menaquinone biosynthesis) 
O-succinylbenzGic acid-CoA ligase 
(menaquinone biosynthesis) 
3153 menaquinone-specific isochorismate synxhas" 

(menaquinone biosynthesis) 
30K molybdopterin precursor biosynthesis 
M93 mofybdoptenn converting factor (subunit i ) 
KS3 molybdooterin converting factor (subunit 2) 
\l£ "^yWopterin-guanine dinucleotide biosynmesis 
ill- ^o^yWoptenrvguanine dinucleotide biosynthesis 
149/ moiybdopter.n biosynthesis protein 
1496 motybdopterin biosynthesis protein 

sis) CyclohydroIase 1 (tetrahydrofolate bosynthe- 
2846 quinolinate synthetase (quinolinate biosynthes ; si 
2*49 ^aspartate oxidase (o;jinolinate biosynthesis) 
28-7 nicotinate-nucieotidepyroprwsrjhorylase 
(NAD/NA0P biosynthesis) 
NHj-dependem NAD* synthetase (NAD b«osyrv 



yibT 
yjbO 
yjbv 

ykpB 
ykvK 
y*vL 
ytbO 
ytnD 
yfnF 
yfoi 
yngH 
yodC 
yvgN 
yqjS 
yrrt 
yrrM 
yueD 
yueJ 
yveK 
yuiG 
yurB 
yurC 
yurO 
yulB 
ywaB 
ywkE 
ywoC 

11.6 
phoA 
phoB 
phoD 
phoH 
xpaC 
ybtM 
ykoX 
ytaK 
yngC 



1245 thiamin biosynthess 
1245 thiamin biosynthess 

If!? ^ os Phomethytpy^rnidine kinase 
1513 thiamin biosynthesis 

IS SeCnX ™^SP terih ******* 
>j»o coenzyme POO synthesis 

577 pyrimidinenhiamine biosynthesis 

633 uroporphyriMllf>r«thyftransf erase 

" 5 ^Porprvrin-111 C^trryttransferase 

2127 nitroreductase 

2574 5-fomr^tetrarivdrDfei3te cycio-i^ase 

2469 pantothenate kinase 

2796 folate metabolism 

2795 MffeoyKJoAarnervttransferase 

3265 sepiapterin reductase. . 

3261 Pyraanamidase/rMMtinamidase 

3260 nicotinatephospf^bosyttransferase 
3293 btotin metabolism 
3335 4-Mroxybenzoj4CcA reductase 
3338 ^rojcybenzoy^CcA reductase 
™ 4-hydroxyben2oyVCoA reductase 
3320 fipoic acid synthetase 
3950 ajjinone biosynthess 
3796 protoporphyTinogen oxidase 
375o isochorismatase 

METABOLISM OF PHOSPHATE 

1018 alkaline phosphatase A 
621 alkaline phosphatase W 
284 phosphodiesterase.^kaline phosphatase 
phosphate starvanrrw/i, .<~qh 



recR 



29 
2836 



ruvB 2835 
sbcD 1)43 



yfpB 
yocl 



1659 
2095 



yorK - 2180 
yqhH 2549 



HcJbday junction DMA hefcase 
Hoffiday junction DMA heScase 
exonudease SbcD horrotooue 
ATP^iepentJent DNA ^^case 
ATP^ependent DNA he&ase 



yrrC 
yrvE 



2808 
2825 



conjugation transfer prote=n 



y\vqA 3735 SNI 

'"•4 DNA PACKAGING AND SEGREGATION 

1% l 9 ^ R^flyrase-litep^gS^r 

i 933 DNAgyraseWikeprotein SSnftB 
7 0NA gyrase (subunit A) 
™ DNA gyrase (subunit B) 
2385 non-specific DMA-binding cro^in H Bsu 
1666 Chromosome segregation SWC prote^homo 

1682 ONAprocessir^Smfprote^r^iogue 
■;^ 0 ^3 DNAtopotsomerasel w 
topB 476 DNAtopoisomeraselll 
yon/V 2225 HU-related DNA-binding proiein 



9TA 

gyrB 

hbs 
smc 

srrtf 
topA 



l»5 RNA SYNTHESIS... 



5,5 spss^asssssr- 



11.7 
yisZ 
yitA 
yitB 
yfnB 
ytnC 
yutH 
yvgQ 
yvqfl 

«! 
111.1 
dnaA 
dnaB 



248 alkaline phosphatase 

1409 alkaline phosphatase 

IS? p |? 0 ? pnare starvation «Jucibte protein 

1947 alkaline phosphatase 

METABOUSM OF SULPHU?. 

1170 adenytyJsulfate kinase ~~ " 

1171 sulfate adenylyttransl^fase 
Phospho-adenyVsutJ-e sutfotransferase 

ic32 sulfate adenyryrtransterase 
1633 adenytyisuifate kinase 
3293 sulfite oxidase 
3431 sulfite reductase 
3433 sulfite reductase 
IN FORMATION PATrf ^Avc 

DNA REPLICATION _ 

0 initiation of chromosDrre replication ~ 



ltl.5.1 
* sigA 
sigB 
sigD 

sig£ 

sigF 

sqG 

sigH 

Si'gL 
sigV 
sigW 
stgX 
sigY 
stgZ 



INITIATION 



.....244 
...19 



3143 



dnaC 
dnaD 
dnaE 
dnaG 
dnal 
dnaN 
dnaX 
hotB 
potA 
poiC 
priA 
mh 
np 
ssb 
yerF 
yerG 
yoqV 
yon. 
ypcP 
ywpH 

111.2 

adaA 



- " r a uu ' ' Ul cnromosDrre replication 
2965 inmatjon of chromosor^ replica::^ / membrane 

attachment protein 
4158 replicatrveONAheitcase 
25 r n S :ion L 0fchrOfn0S0,7 * r epfica'Jcn 
fJrfe EK A P°hrfl^se III (o subunit} 
2o03 DNAprimase 

2953 primosomecomponenirheiicase leader) 
2 ONA polymerase III (Psirunit) ' 
li RJS f^^se '» (T»VJ r subunEs) 
2975 DIM A polymerase I 
1727 DNA polymerase III (a SLCunit) 
ib-3 primosoma! repiicatior iaaor y 
1677 nbonuclease H 
2013 replication terminator pr^cein 
4jS9 single-strand DNA-binor^ protein 
719 ATP<jependentDNAr*case 
721 DNA ligase 
2192 DNA ligase 

fi 7 ? ^ A P 0, y rTi eraseill(as^nit) 
«ii 5 -3 exonuclease 
3740 sirgie-strandDNA-binc - 5 protein 



nil i uir t\jv4 

™^ pofymerase general s^ess sioma factor 

17,6 ssswj^^s^a 

^ sporulacon fcxespore-spedfic 

16M f ea ^ m afactor(^)(SpottAC) ^ C 
1605 ^polymerase sporulation (orespore-specific 

»' R ^ A P°^erasevegetaLveandeanystationarv 
35t 3 g^^afactwfo-JfSpcOfl earTystai,onaf y- 
™ ^Porymerase sigma factor fcr) 

» ^p^ e e : a Lic c ^ s ^^ 

W//C 2TO l^SS eraS6E ^^S " 
spowo 2701 RNA polymerase sporulation nxsher ce'lsLcifir 

spotVCB2te2 t ^ A ,S ^ fa «or(^)(C^ e rn^alfr Sf ^ fiC 

spolVCB 2652 RNApoiymerase sporuladon mahe^ell-specific 

1324 K^'^iS^-^^half) 
™ Pofymerase PBSX sigma factor-like 

Jln ^APO^ase srgma factor 
1543 RNA pofymerase ECF-typesigrra factor 



/hdM 
ykoZ 
yfaC 



IH.52 
abn 



REGULATION . 



.213 



abrB 

acoR 

ahrC 

aisR 

ansR 

araR 

azB 
birA 



thesis) . 

3772 morybdopterin precursor biosynthesis 
35o uraporphyrin-lll Omethyttransferase (rjorphyrin 

-o.n •wyn^esis) "'^t* -vA» 

2849 required for NAD biosynthesis ^ J ^ 

84 p-aminobenzoate synthase glutamine amido- ' 
transferase (subunit B) / anthranilate synthase 
(subunit II) (folate and tryptophan biosynthesis) 
t" ^aminobenzoate synthase (subunit A) (folate 
biosynthesis) 

65 l^n^o^orismate lyase (lolaie bios>Titne- 

2354 ketopantoate rydroxymethyrtransferase (pan- 
tqtnenatebiosynthes'S) 

2353 pantothenate synthetase (pantothenate biosyn- 
thesis) 

2352 espanate l-oecartoxyiase (pantothenate biosyn- 
tnesis) 

2429 GTP cyclohydroiase It / 3.4-dihydroxy-2-bu^none 
, , ^ 4 -p t 5 sphate s V nt h a se (iboflavin biosynthesis) 
2-^9 nboflavin syntnase (a subunit) (riboflavin b^syrv 
thesis) 

1737 riboflavin kinase / FAD synthase (riboflavin 
biosynthesis) 

2431 ribofiavin-specifie deaminase (riboflavin biosyn- 
thesis) ' 

2428 riboflavin synthase (0 subunit) (riboflavin biosyn- 
thesis) ' 

2427 reductase (riboflavin biosynthesis) 

66 dihydropteroate synthase (dihydrofolate bios-zn- 
tnesis) 

955 synthesis of the oyrimioine moiety of thiamin fmi- 
amir biosynthesis) * 

3930 thiamine-phosohate pyrophosphoryfase (thiamin 
twosynthesis) 

3900 phosphometnyipyrimidine kinase (thiamin biosyn- 
thesis) 

3931 hydroxyethyftniazole kinase (thiamin biosynme- 

26 isochorismatase 

640 thiamin-monccr.Gspnate kinase 

646 mofybdoptenn crecursor biosynthesis 

1058 coproporphyrinogen ill oxidase 

979 ftavodoxm 

1112 bioan biosynthess 



adaB 
atkA 
dat 
dinB 
dinG 
?xoA 
mtbP 
mutL 
mutM 
mutS 
mutT 
ntb 
sms 
ung 
uvrA 
uvrB 
uvrC 
uvrX 
ydiO 
ydiP 
ydiS 
yfhO 
yf } P 
yisT 
yjcD 
yjhB 
yozK 
yprA 
ywA 
yQfS 
wh 

yq/W 
yshC 
yshD 
ysxA 
yvc/ 
ywjD 
yxU 



addA 
BddB 
recA 



recF 



D ^ p |STRlCTION/MODIFICA-|ON AND 
204 metrMprwprStrte^^ 
w» ^'^ona'activatorcf^eadaAffcperon 

2tb ONA-s^nethyladenineg^cosylase 
W21 ^-rnethytguanineDNAETcyltransferase 
608 nuclease inhibitor 
2352 ATP-dependentDNAhe:case 
4193 3'-exo^eoxyribonucieas= 
2 ,ill ^^'cat'onmethylaseB^ 
1778 DNA mismatch repair 
2972 formamidopynmidlne-DNAghycosidase 
1775 DNA mismatch repair (reocgnrJon) 
488 mutator protein 
2345 endonucieaseltl(DNArecar) 
106 D.^rer^irpmteinhornoiccue 
3897 uraai-DNAgrycosyiase 
3612 exonuclease ABC (subun- A\ 
36U excnudeaseA8C(subunc5) 
2912 excinudeaseASC(suburuC) 
2271 UV-aamage repair protein 

DNA-methyftransferase (cyrosine-speeffe) 

A/G-soecific adenine gl>-ccsy;ase 
P^^ethyladeninegiyccsiaaselt 
1165 nuclease inhibitor 

1255 ATP-dependentDNAhelicase 

1290 mutator MutT protein 

2064 DNA repair protein 

2336 ATP-dependenthelicase 

2329 ATP<Jeoendentheiicase 

2593 endonucleaserv 

2483 DNA-damage repair protein 

2465 ATP/GTP^inding protein 

2924 DNA pofymerase (J 

2922 DNA mismatch repair protest 

2862 DNArepair protein 

3572 mutator MutT protein 

3817 UV-endonuclease 

3964 DNA-3-methyiarJenine giyccs^ase 

ONA RECOMBINATION „ , 7 

«6 AT?<ependen l deo^ / nbor«:ease(subcn.iB 
1764 mut : iwnc: l0 nal protem invoi«! in home'ecous 

reccTb.r.ation and DNA rep*f (LexA-aLtc^ieav- 

agej 

P^f '- S ' T and genetic 'ecc— t^at^n 



bltR 

bmrR 

ccpA 

cheB 



cheY 



655 
656 
660 
935 
872 



15,7 (Ab n r?Sk^ 

45 transcriptional pleiotropic regulator of transition 
2522 eSn^^ 

37,1 o^rs^ uiatorofme ^ o,acta ^ 
3485 ,sis^ ssofo,ffe3 ^ ose ^ 

ri" SCT - P ^! repressofof: he a z«CDoperon 

' D ' 0t,n ^tyK^rboxylasesyn- 
2716 transcriptjonal regulator of the firtDoperon 
2495 ^anscripDoralactr^torofmebrvt^peron 
3044 transcnptwrBlregufatorirwrvednwrtSn 
catabolite control 

17,1 J^ mponentres Ponseregufa--cc-like[CheAJ/ 
1703 two-component response regular [CheAl 

!3SSisT x,ulaIion 01 flaseilar **** bias ' 

,02 ° g a n ^sr ;re ^ oro,me&r3:Bsynihasei 

832 two<omponent response regular ICrtS] 

comA 3253 ^^mponent response regulator rdomP] of 
late competence genes /surfacrn Dfoductibn 
1H7 competence transcription factcr(CTF). final 

s?mir TOn " ois ^ chpnvt 

3256 transcriptional regulator of late ccrrcetence oper- 
on (comG) and surfactin expression (srMJ 

SScSSS™ 1 rePr6SSOr ° f C ' aSS m Be"" 
1163 transcriptional activator involved in degrada- 

amid°o&S^ 
3644 ^^^.P^nt response regulates PegSl 

involved in degradative enzyme arxj c^oetence 
regulation (sacft de^a com*) ™ 00m » , f m « 
4052 uanscnptional repressor of the drafr^c,^ 
™, pPe^.fdeoxyribonucleoside) ^ P 
(^S^ Uto °' anae ^^ 

1507 (XflSr°^ ,repfe ^°^ efri ^ operon 

2904 ^anscriptionairegulatorrequiredforexpression 

°"ate spore coat genes 
3739 janscriptional repressor involved in tr* expres- 
i*» f^or^ep^photransferasesysarn^ 
kso transcnptiona) antiterminator essencaJ for the 
,R77 ^^^nofthepcGH/operon ^ tfie 
1677 transcriptional repressor of the giutanane svn- 
thetasegene(p/a4) b ^^syn i 

^f^°n 3 'antiiermir«toranccc^of 
mRNA stability of glpD 

2014 wnsaipdona! activator of the tfuian-^-mmthaw 
y'fl ^. t 5 ^anaciipticnatreoressorofthec-.*^^,;,.,,. 



c/fT" 
codv 



comAT 

comO 

cfsfl 

de^A. 

degV 

deoR 

fnr 

fruR 

gerE 

gicR 
gicT 
glnR 
g!pP 
g:tC 



gutft 
hpr 
hrcA 
hutP 

kdgR 

bcft 

levR 

iexA 
ScR 



tmrA 
bpA 

trpB 

trpC 
tyiR 
fytT 

msmR 
mta 

mtrS 

paiA 

pais 
phoP 

pksA 
purR 
pyrR 

rbsR 



ribR 
nxR 
sacT 
sacV 
sacY 
senS 
sinR 

sir 

splA 

spoOA 

spoOF 

spoltID 

spoVT 

tenA 

tent 

tnrA 

ueR 

xre 

xytR 



290 
551 



552 



476 



1781 
54 



1618 



yacF 88 



ybbB 
ybdJ 
ybfl 
ybfP 
ybgA 
ycbB 
ycbG 
ydbL 
yccH 
yceK 
.ycg< 
ydA 

ycU 

ycnC 

ycnF 

ycnK 
ycsO 
ycxD 

yczG 
ydaA 
ydbG 
ydcN 



667 transcriptional activator ol the sorbitol dehydroge- 

nase gene {guiA) 
1073 transcriptionei repressor of sporutation end extra- 
cellular proteases genes (apf£, nprE, sin) 
2629 transcriptional repressor ol class I heat-shock 

genes idnaK.gr of SL) 
4040 transcriptional activator ol the histidine utilization 

operon [hutPHUtGM) 
4084 transaiptional repressor of the myo-lnositol 

catabolism operon ( iolABCDEFGHU/ ioiRS) 
2325 transcriptional repressor of the pectin utilization 

operon (kdgRKAT) 
3509 transcriptional repressor of the (Hjatactosklase 

gene(tecA) 

2765 transcriptional activator of the levanase operon 

[levDEFG/sacQ 
1918 transcriptional repressor of the SOS regulon 
3963 transcriptional regulator (amfterminatof) of the 

lichenan operon [licBCAH) 
4012 transcriptional antrterminator required tor sub- 
strate-dependent induction and catabolite repres- 
sion of bgtPH 

transcriptional repressor of the lincomycln operon 
(ImrBA) 

transcriptional Lrp-fike regulator (repression of 
gfyA transcription and KinB-dependent sporula- 
tion) 

transcriptionaf Up-like regulator (repression of 
^^transcription and JOnS-dependent sporula- 
tion) 

. _ transcriptional regulator (Lrp/AsnC family} 
3662 attenuator role for lytABC and Aytfl expression 
2956 two-component response regulator [LytS] 

involved in the rate of autolysis 
3096 transcription! regulator (Lad family) 
3764 transcriptional aaivator of rrnjrrjdrug-effJux trans- 
porter genes (fxnfand oft) 
2384 tryptophan operon RNA-binding anenuation pro- 
tein (TRAP) 

3304 transcriptional repressor of spoliation, septauon 
and degradative enzyme genes (aprE, nprE, 
phoAsacB) 

3304 transcriptional repressor of sporutation and 

degradative enzyme genes 
2978 two-component response regulator [PhoR] 
involved in phosphate regulation [phoA, phoB, 
phoD, resABCDE) 

transcriptional regulator of the poryfcetide syn- 
thase operon {pks) 

transcriptional repressor of the purine operon 
ipurEKBCLOFMNHD) 

transcriptional attenuation of the pyrimtdtne oper- 
on ipyrPBCADFE) / uracil phosphoribosyttrans- 
ferase activity (minor) (pyrimtdtne biosynthesis) 
3700 transcriptional repressor of the ribose operon 

(rbsRKDACB) 
2417 two-component response regulator [ResE] 
involved in aerobic and anaerobic respiration 
(resA ctaA qcrABC, fnr) 
3001 transcriptional regulator of riboflavin biosynthesis 
genes 

4145 transcriptional activator of arginine utilization 

operons(roc4f?C. rocDEF) 
3906 transcriptional antrterminator involved in positive 

regulation of sacA and sacP 
532 transcriptional regulator of the levansucrase gene 

(sacfl) 

3942 transcriptional antiterminator involved in positive 
regulation of levansucrase and sucrase synthesis 

959 transcriptional regulator ol extracellular enzyme 
genes {amyE. aprE. nprE) 

2552 transcriptional regulator of posi-exponential- 
phase responses genes {aprE, comK, kinB, sigD, 
spoOA spoilA spollE, spollG) 

3529 transcriptional activator of competence develop- 
ment and spoliation genes 

1 461 transcriptional regulator of the spore photoprod- 
uct lyase operon {splAB) 

25 18 two-component response regulator pdnC] centra! 
for the initiation of sporutation (spoOA abrft kinA, 
kinC. spoflA spoUE, spollG) (pan of phosphore- 
lay: SpoOB-P->SpoOA~P) 

3809 two-component response regulator [KinA. JCinB] 
involved in the initiation of spoliation (part of 
phosphorelay: Spo0F-P->Spo0B-P) 

3748 transcriptional regulator of o 1 - and c**dependeni 
genes 

64 transcriptional positive and negative regulator of 
o 3 -dependent genes 

1242 transcriptional regulator of extracellular enzyme 
genes [aprE. nprE. phoA, sacB) 

1 243 transcriptional activator of extracellular enzyme 
genes 

1 397 transcriptional pletotropic regulator invoved in 
global nitrogen regulation (expression of nrgAB. 
nasB, gabP. ureABC. gtnRA) 
transcriptional repressor of the trehalose operon 
(trePAR) 

transcriptional repressor of PBSX genes 
transcriptional repressor of the xylose operon 
[xyiAB) > 
transcriptional regulator (nitrogen regulation pro- 
tein) 

transcriptionaf regulator (AraC/XylS family} 
two-component response regulator fYbdK] 
transcriptional regulator (AraC/XyiSfamity) 
transcriptional regulator (AraC/XytS family) 
transcriptional regulator (GntR family) 
two-component response regulator [YcbA] 
transcriptional regulator (GntR family) 
two-component response regulator [YcbM] 
two-component response regulator (YccGJ 
transcriptional regulator (ArsR family) 
transcriptional regulator (LysR family) 
transcriptional regulator (LysR family) 
two-component response regulator fVclKJ 
438 transcriptional regulator (TetR/AcrR family) 
441 transcriptional regulator (GntR family) / amino- 
transferase (MocR-like) 
transcriptional regulator (DeoR family) 
transcriptional regulator {IcIR family) 
transcriptional regulator (GntR family) / amino- 
transferase (MocfWike) 
transcriptional regulator {ArsR family) 
transcriptional antiterminator (BgIG family) 
two-component response regulator [YdbF] 
transcriptional regulator (phage+elated) (Xrefami- 
W 



■ yceC 

ytieE 

ydeF 56* 
ytiel 571 



853 
1321 



185 
221 
244 
251 
258 
267 
273 
278 
296 
320 
341 
412 
426 



449 
461 
406 

439 
467 
499 
631 



ydeS 
ydeT 
yeJfD 

ydfl 

ydgG 

ydgJ 

ydhC 

ydhO 

yerO 

yesN 

yesS 

yet 

yezC 

yftF 

yfiK 

yW 

yfmP 

ygaG 

yhbl 

yhcF 

yhcZ 

yhdl 

yhdO 
yttgD 
yhjM 
yi$R 
yisV 

yldC 

yjdt 

yjmH 

ykoG 

ykoM 

ykuM 

ykvE 

ykvZ 

ymfC 

ynel 

yoall 
yobD 

yobO 
yocG 
yofA 
yonR 

y02A 
yozG 
yptP 
ypoP 
yppO 
ypuN 
yqaE 

yqd 

yqfV 

yqhN 

yqiR 

yqkL 

yraB 

yraN 

yrdQ 

yrhl 

yrhM 

yrkP 

ysiA 

ysrnB 

ytdP 

ytU 

ytrA 

ytsA 

yxzE 

yutM 

yugG 

yulB 

yurK 

yvsO 

yusT 

yvbA 

yvbU 

yvcP 

yvdE 

yvdT 

yvfl 

yvfU 

yvhj 

yvkB 

yvoA 

yvpA 

yvqC 

yvrH 

ywaE 

ywbl 

ywfK 

ywhA 

ywoH 

ywqM 

yvvrC 

ywtF 

yxaD 

yxdJ 

yxjL 

yxjO 

yyaG 

yyaN 

yybA 

yybE 

yycF 

yydK 

fll.5.3 

greA 

mfd 

papS 

rpoA 

rpoB 

rpoC 

rpoE 

yvrE 



562 vi-szrziar.6i regulator (AraC/XytS family) 
56* ra-.sc-' crcnai regu'ator (AraC/XylS lamiiy) 
L'3"scx!-jora' regulator (GntR family) / amino- 
ca-srerase (Moca-iiKe) 
trar>*ci c:*nat regulator (GntR famfly) / amino- 
rar^-'erase (MocR*ii*e) 
tra^scr ?::onai regulator (TetR/AaR family) 
transcnafconal regulator (ArsR family) 
tra-sc^ctionai regulator (GntR family) / amino- 
transferase (MocR-iike) 
rAO-cc^noor-e^.i response regulator (YdfH] 
transcr.p:jonai regulator I MarR family) 
tra-xscriTtonal regulator (MarR family) 
traisc-cTtcnai regulator (GntR family) 
trarscctional regulator (GntR family) 
transcr ptional regulator (TetR/AcrR family) 
two-comoonent response regulator [YesM] 
icrctjonal regulator (AraC/XylS family) 



578 
579 
583 

589 



transcnoaonal regulator (MarR family) 
inCfarr 



transcriotional regulator (Lrp/AsnC family) 
uanscnpiionai regulator (AraC/XylS family) 
905 two-comoonent response regulator [YfU) 
916 transcr^tional regulator (MarR family) 
812 transcriptional regulator (MerR family) 
944 transcnctional regulator (Fur lamiiy) 
976 transcnptional regulator (MarR family) 
881 transcriptional regulator (GntR family) 
1009 twc-corroonent response regulator (YhcY) 
1027 transciatonai regulator (GntR (amity) / amino- 

trarsVase (MocR-like) 
1033 trarsc- cuonai regulator (MerR family) 
1089 transcriptional regulator (TetR/AcrR family) 
1129 transc*iD:ionai regulator (Lad family) 
1 162 farscriotonal regulator ( AraC/XylS family) 
1 166 trar^c;o;ionai regulator (GntR family) / amino- 
transferase (MocR-like) 
1270 transcxtional antiterminator (BgIG family) 
1277 transection regulation 
1308 trar=cr*tional regulator (Lad family) 
1391 two-cc*T!ponent response regulator [YkoH] 
1398 tran£crictionalregulaior(MarRfami!y) ^ 
1485 tiar^criptional regulator (LysR family) ' ^-•Vr' 
1433 tra^^cripttonal regulator (MarR family) 
1455 transcr-otional regulator (Lad family) 
1 754 tra-.sc.''pronal regulator (GntR family) 
1923 two-ccmoonent response regulator (CheYhorno- 
log-ei 

2045 transitional regulator (LysR family) 
2056 transactional regulator (phage-related) (Xre fami- 
ly) 

2080 trarsciptional regulator (AraC/XylS family) 
2091 rAC-comconem response regulator [YocP] 
2007 trar^cr-ptional regulator (LysR lamiiy) 
2221 tra-^T'Ottonai regulator (phage-related) (Xre fami- 
ly) 

2084 uanscr ctional regulator (ArsR family) 
2043 t'arsrr ptionai regulator 
2294 trarsr clonal regulator (a -dependent) 
2287 tra*i3" ptiona! regulator (MarR family) 
2287 ira-sc.stional regulator (PilB family) 
2414 neg regulator of a* activity 
2698 tra*^r ctionai regulator (phage-related) 

(Xre^iiy) 
2657 tra-scriotonal regulator (ArsR family) 
259 1 transcriptional regulator (Fur family) 
2543 tra-scptional regulator 
2506 tra-'S Motional regulator (o -dependent) 
2450 ^a-sc-rstiona! regulator (Fur family) 
2755 r/ansc?:oitonal regulator (MerR family) 
2746 transcr ptional regulator (LysR family) 
2721 transactional regulator (LysR family) 
2777 transcriptional regulator (TetR/AaR family) 
2770 ana-s-gma factor {o v ] 
2704 two-component response regulator [Y rkO] 
2918 trarsc^iptional regulaior {TetR/AcrR family) 
2904 traTscriotional regulator (MarR family) 
3033 trarsctptional regulator (AraC/XylS family) 
3008 tr3"*=riptional regulator (LysR family) 
31 18 transcr ptional regulator (GntR family) 
3113 tv.u-c&T.ponent response regulator [YtsB] 
3071 trarscr^tional regulator (DeoR family) 
3238 two-companem response regulator (YufL] 
3227 tracer pnonal regulator (Lrp/AsnC family) 
3201 tra-.sc- otionai regulator (DeoR family) 
3345 trans" ptional regulator (GntR family) 
3374 fa-icriotional regulator (MarR family) 
3377 tra*scr:o!ionat regulator (LysR family) 
3466 tra"S":otional regulator (ArsR family) 
3488 tra-scrtotional regulator (LysR family) 
3567 r.vo-ccmponent response regulator 1 [YvcQ] 
3558 tra'-scrioiional regulator (Lad (amily) 
3540 traj-scT'Ofonal regulator (TetR/AcrR family) 
3509 transcriptional regulator (GntR family) 
3496 rAO-COmoonent response regulator [YvfT) 
3646 tra-5criptional regulator 
3617 trarscrpiionai regulator (T etR/AcrR family) 
3596 tra-^SDripttonal regulator (GntR family) 
3385 rAO-component response regulator [YvqB] 
3394 two^omponent response regulator [YvqE] 
3409 twc-component response regulator [YvrG] 
3945 ra-scriotional regulator (MarR family) 
3932 trarscriptional regulator (LysR family) 
3864 rja- sc-iotional regulator (LysR family) 
3853 tra-s—'Ptionai regulator (MarR family) 
3748 tra-scr;ptional regulator (MarR family) 
3723 tra-sc':ptional regulator (LysR family) 
3720 trar.scr;ctional regulator (Lrp/AsnC family) 
3693 tra^^plional regulator 
4109 tra-scnotjonat regulator (MarR family) 
4072 TAC-ccmponent response regulator [YxdK] 
3993 tv.ocomponent response regulator [YxjM] 
3991 trarscr.ptional regulator (LysR family) 
4197 trarscriptional regulator (Lad family) 
4189 tra^scr optional regulator (MerR family) 
4183 trar-scriptiona! regulator (MarR family) 
4180 transcriptional regulator (LysR family) 
4154 rif.o-smponent response regulator fVycGJ 
4122 trarscriptional regulator (GntR family) 

ELONGATCM 8 

279 1 trer-s—ption elongation factor 
60 irar^c:.ptiorwepair coupling factor 
2356 pc*/ Aj polymerase 
149 RNA polymerase (a subunit) 
122 RMA polymerase (B subunit) 
126 RNA polymerase (&' subunit) 
3812 RNA pofymerase (fi subunit} 
3406 RI^A polymerase 



HI 5 4 
nusA 
nusG 

yvV 

111.6 
cspR 
deaD 
miaA 
queA 

mcS 
mpA 
rpf. 

tmD 
trvA 
tnjB 
ydbR 
yetA 

yfjo 

ylml 
ytoM 
ypfR 
ysgA 
yogi 

1117 

UL7.1 
rpiA 
rp!3 
rpC 
rp!D 
rptS 
rpiF 

n>:; 

rpU 
rp!K 
rplL 

rjrJW* 
rpK> 
rpiP 
rpsO 
rplR 
rpiS 

rprr 

rpi'J 
rprv 
rp.W 
rp!X 

rpmA 
rpm3 
rpnC 
rprr.D 
rpn~ 
rprrF 
rprr.G 
rpmh 
rprr.i 
rpnj 
ros3 
rpsC 
rpsD 
rpsE 

rpsr 
rpsG 
rpsH 
rpsl 
rpsJ 
rpsK ' 
rpsL 
rps.'.f 
rps\' 
rpsO 
rpsP 
rpsO 
rp$R 
rpsS 
rpsT 
rvsU 
ybxF 
yhzA 
ylxO 
yvyD 

11172 
ataS 
argS 
asnS 
aspS 
cysS 
gtix 
g'yo 
gtys 

NsS 

hisZ 

iieS 

leuS 

tysS 

meiS 

pheS 

pheT 

proS 

serS 

mrS 

thrZ 

VPS 

tyrS 

tyrZ 

va'S 

ytpR 

111.73 
fmt 
infA 
infB 
infC 
rbfA 
ykrS 

0.7.4 
efp 
fus 



T=RM:NATICN 

1732 transcription termination 
1 18 vanscnotion antitermination factor 
3804 transcriptional lermtnaiofRho 
2529 transcripton termination 



RNA MODIFICATION _ 19 

970 rRNA methyiase homofog 
4016 ATP-deoendentRNAheiicase 
1 866 tRNA isopentenytpyrophosphate transferase 
2834 Sadenosyimethiontne tRMA ribosyttransferase 

(aueuosine biosynthesis) 
1665 ribonudeaselll 
42 14 ribonuctease P (protein component) 
2901 ribonudeasePH 

2833 tRNA-guaninetransgtycosylase(queuosine 

biosynthesis) 
1675 tRNAmethyltransferase 
1 53 pseudouricVate synthase l 
1736 tRNA pseudouridine 5S synthase 
51 1 ATP-dependem RNA heltease 
737 RNAmethyftransferase 
873 RNAmethyttransferase 
816 RNA helicase 
1647 RNA-binding Sun protein 
2595 ATP-dependent RNA helicase 
2931 rRNA methyiase 

3225 poryribonudeoikJe nucteoodyttransferase 

PROTEIN SYNTHESIS 96 

RIBOSOMAL PROTEINS — . 56 

1 19 nbosomal protein LI (BL1 ) 

137 fibosomal protein L2 (BL2) 
136 ribosomal protein L3 {BL3) 

1 36 ribosomal protein L4 

141 ribosomal protein L5{BL6) 

142 ribosomal protein L6 (BUB) 
4163 nbosomal protein L9 

120 ribosomaiproteinL10(BL5) 
1 19 noosomal protein L1 1 (BL1 1 } 

121 ribosomal protein L12(BL9) 
. 154 ribosomal protein L13 
• ■ ^40 nbosomal protein L14 

144 ribosomal protein Li 5 

139 ribosomal protein L16 
1 50 ribosomal protein LI 7 (BL15) 

143 ribosomal protein L18 
1675 nbosomal protein LI 9 
2952 ribosomal protein L20 
2855 ribosomal protein L21 (BL20) 

138 ribosomal protein L22 (BL17) 

137 ribosomal protein L23 

141 ribosomal protein L24 (BL23) (histone-iike protein 
HPB12) 

2854 ribosomal protein L27 (BL24) 
1 655 ribosomal protein L28 

140 ribosomal protein L29 

144 nbosomal protein L30 (BL27) 
3802 ribosomal protein L3i 
1575 nbosomal protein L32 
1 17 nbosomal protein L33 
4215 rtDOSOmal protein L34 
2952 nbosomal protein L35 
148 ribosomal protein L36 (ribosomal protein 8) 
171 7 ribosomal protein S2 

139 ribosomal protein S3 (BS3) 
3035 ribosomal protein S4 (BS4) 
143 ribosomal protein S5 
4199 nbosomal protein S6(BS9) 
130 ribosomal protein S7(BS7) 

142 ribosomal protein S8{BS8) 
154 ribosomal protein S9 
135 ribosomal protein S10(BS13) 

143 ribosomal protein SI 1 (BSl 1) 
1 30 ribosomal protein S 1 2 {BS 1 2) 
143 ribosomal protein S13 
1^2 ribosomal protein S14 
1733 ribosomalproteinS15(BS18) 
1673 nbosomalproteinSl6(BSl7) 

1 40 ribosomal protein S 1 7 (BS 1 6) 
4198 ribosomal protein S18 

138 ribosomal protein Sl9 (BS19) 
2635 ribosomal protein S20(BS20) 
2620 ribosomal protein S21 
129 ribosomal protein L7AE family 
965 ribosomal protein Si4 
1 733 ribosomal protein L7AE family 
3631 nbosomal protein S30AE family 



AMiNOACYL-TRNA SYNTHETASES 

2300 alanyMRNA synthetase 

3834 erginyURNA synthetase 

2347 asparaginyt-tRNA synthetase 

2316 aspartyi-tRNA synthetase 

1 13 cysteinyl-tRNA synthetase 

111 gluiamyHRNA synthetase 

2508 glycyt-tRNA synthetase (a subunit) 

2607 gtycyl-tRNA synthetase (B subunit) 

2817 histidyl-tRNA synthetase 

3588 histidyHRNA synthetase 

1613 isoieucyMRNA synthetase 

3104 leucyMRN A synthetase 

89 lysyHRNA synthetase 

46 methionykRNA synthetase 

2930 phenylalanyl-tRNA synthetase (a subunit) 

2929 phenylalanyl-tRNA synthetase (B subunit) 

1725 proM-tRNA synthetase 

21 seryt-iRNA synthetase 

2960 threonyMRNA synthetase (major) 

3855 threonykRNA synthetase (minor) 

1219 tryptophanyl-tRNA synthetase 

3037 tyrosyHRNA synthetase (major) . 

3946 tyrosyHRNA synthetase (minor) 

2869 varyHRNA synthetase 

3052 pnenyialanyHRNA synthetase (B subunit) 

INITIATION ... 



..25 



1646 methionykRNA formyltransferase 

148 initiation factor IF-1 

1733 initiation factor IF-2 

2952 initiation (actor IF-3 

1736 ribosome-binding factor A 

1423 initiation factor elF-2B (a subunit) 

ELONGATION... 



2538 elongation factor P 
131 elongation factorG 



tepA 
tsf 
tutA 
yiaG 

IH.7.5 
frr 
prtA 
pffB 

III.3 
amhX 
& 

map 
pep 
ppiB 
prkA 
tgt 

ybdM 
ydiC 
ydiD 
ydiE 
yfkj 
yflG 
yjcK 
ykrB 
ykvY 
yioP 
yppP 
yqoT 
yxjhT 
ytet 
ytjP 
ytvA 
ytxM 
yuiE 
ywtE 
yxaL 

111.9 
dnaK 
groEL 
groES 

ykkC 
ykkO 
yvdR 
yvdS 

U 
iv.i 
bsaA 
. c/pC 

ctpE 
cipP 

dpQ 
dpX 

clpy 
csbB 
cspB 
cspC 
cspD 
cstA 
etc 
degQ 
degR 
dnaJ 
dps 



2632 GTP-binding protein 
1718 elongation factor Ts 
133 etongation factor Tu 
1546 GTP-binding etongation factor 

TERMINATION . 



...27 



1720 nbosome recycling factor 
3797 peptide chain release factor i 
3627 peptide chain release factor 2 

PROTEIN MODIFICATION 

325 amidohydrolase 
3593 prolipoprotein dtacylglyceryf transferase (lipopro- 
tein biosynthesis) 
147 methionine aminopeptidase 
287 pyrrofidone-carboxylate peptidase 
2435 peptidyl-proM isomerase 
973 serine protein kinase 
3212 transglutaminase 
224 protein kinase 

642 glycoprotein endopeptidase 

643 ribosomat-protein-atanine /V-acetyttransferase 
643 glycoprotein endopeptidase 

862 protein-tyrosine phosphatase 

840 methionine aminopeptidase 

1261 nbosomaJ-protein-alanine rVaceiyttransferase 

to26 formytmethionine deformytase 

1453 Xaa-Prodipeptidase 

1651 protein kinase 

2287 peptide methionine sulfoxide reductase 

2££i ribosomai protein Lll methytcansf erase 

2539 Xaa-Prodipeptidase 

3020 protease IV 

3068 Xaa-His dipeptidase 

3105 protein kinase 

3150 prolyl aminopeptidase 

3297 leucyl aminopeptidase 

3791 protein-ryrosine-phosphatase 

4102 serine/threonine protein kinase 

PROTEIN FOLDING .. 



yvfE 
yvt8 
ywjC 
ywqD 
ywqE 
„.3 ywsC 
ywtA 
ywtB 
yyxA 



IV.2 
aadK 
ahpC 
ahpF 

bmrU 

cah 

cypA 

cypX 

katA 

katB 

kaiX 



gbsA 

gbsB 

grpE 

gsiB 

gspA 

hit 

htpG 

htrA 

ispU 

lonA 

tonB 

mrgA 

rsbR 

rsbS 

rsbT 

rsbU 

rsbV 

r$bW 



ycdht 

ydaG 

yfiO 

ykrL 

ykzA 

yioA 

ytoU 

ynbA 

ynzF 

yocX 

yocM 

yodU 

yokG 

ypqP 

ytxG 

yuH 

yxxJ 

yveK 

yvei 

yveM 

yveN 

yveO 

yveP 

yveO 

yeR 



2627 class I heat-shock protein (chace'onmj 
650 class I neat-shock protein (chaze'onin) 
650 class I heat-shock protein (ct-aoccnin) 
2887 trigger factor (prolyl 'SOmeraset 
1376 chaperonin 
1376 chaperonin 
3541 chaperonin 
3541 chaperonin 
OTHER FUNCTIONS PflQ 

ADAPTATION TO ATYPICAL CONDITiONS . 72 
2304 glutathione peroxidase 
104 class lit stress response-relarec ATPase (repres- 
sor of competence) ■ 
K37 ATP-dependent Clp protease-fae 
3545 ATP^ependent Op protease proteolytic subunit 

(class Ml heat-shock protein) 
1683 p-t/De subunit of the 20S proteascme 
2385 ATP-dependent Op protease ATP^nj.ng subunit 

(class III heat-shock protein) 9 
16SS ATP-dependent Clpprotease-?:<e 
930 stress response protein 
9S4 major cofd-shock protein 
559 cold-shock protein 

2307 cold-shock protein 
2937 carbon starvation-induced orate n 
59 general stress protein 
3256 degradative enzyme production 

2308 degradative enzyme production 
2625 neat-shock protein {activation of DnaK) 
3135 stress- and starvation-induced gere controlled by 

3186 glycine betainealdehydedehydrcoer.ase(osmo- 
protection) 

3134 aicohof dehydrogenase (osmoDro:ec:ion) 
2528 heat-shock protein (activation of DnaK) 
494 general stress protein 
3944 general stress protein 
1076 Hit-like protein involved in cell-cycie regulation 
4090 class III heat-shock protein (cbaoeronm) 
1359 .seine protease Do {heat-shock p*r-;ejn) 
1387 activation of a~ 

2882 class 111 heat-shock ATP-dependerr. orotease 
2884 Lon-tikeATPKiependerrtprcMasa 
3333 metalloregulaiionDNA-bmdingWfewttein 

519 positive regulator of cr activity (interacTKin with 
RsbS) 

520 R e bT) Ve reSU ' 3tor °' c ' activ ' t y (arsonist of 
520 positive regulator ol <r activity (switch 

protein /serine kinase [RsbS]) 
indirect positive regufator of <H actvw (serine 
phosphatase {RsbV~PJ) 
positive regulator of a* activity (an£-ar.;:-siama 
factor [RsbW]) u 
negative regulator of a 3 activity (sw-tch 
protein /serine kinase [RsbVJ, ant^gma factor 
[o 1 ]) 

indirect negative regulator of o 3 aovfcy (serine 
phosohatase[RsbS-P]) 
adhesion protein 
general stress protein 
. surface adhesion 
1414 heat-shock protein 
1331 general stress protein 
1637 fibronectin-binding protein 
1655. alkaline-shock protein 
1875 GTP-binding protein protease moduiatar 
1880 6-endotoxin 

2097 general stress protein 

2098 small heat-shock protein 
2151 capsular polysaccharide biosynthesis 
2279 ^endotoxin 

2286 capsular polysaccharide btosynmes^ 
3047 general stress protein 
3047 general stress protein 
3046 general stress protein 
3529 capsular polysaccharide biosynthesis 
3528 capsular polysaccharide biosyntnesis 
3527 capsular polysaccharide biosynthesis rv.4 
3525 capsular polysaccharide btosynthess codV 
3524 exopofysaccharide biosynthesis n pX 
3523 capsular polysaccharide biosynthesis xhlA 
3522 caosularrjc^ccharidebiosynires's • xhiB 

352: spore coat por/saccharide biosyn^es^ 
3519 cacsu'3f ooM^crnflri^o h'^ev—w* .... , , 



mmr 
padC 
penP 
pksS 

sodA 
sodF 
ten 
thdF 
tmrB 
yaaN 
ybbE 
ybfO 
ybxl 
ycbj 
ycbP 
yceC 
yceO 
yceE 
yceF 
yceH 
ycsF 
ydbO 
yd(B 
ydhc 
yerP 
yeiM 
yetO 



3515 spore coat polysaccharide biosynthesis 
3384 senne protease Do 
3732 capsular polysaccharide biosynthesis 
3732 capsular polysaccharide biosynthesis 
™ "P 5 "'^ Polysaccharide biosynthesis 
3700 capsular polyglutamate biosynthesis 
3698 capsular pofyglutamate biosynthesis 
3698 capsular pofygfutamato biosynthesis 
4148 serine protease Do 

DETOXIFICATION 

2736 aminoglycoside 6-adenytyttransf erase 

<118 alky* hydroperoxide reductase (small subunit) 

4119 aikyi hydroperoxide reductase (large subunit)/ 

NADH dehydrogenase 
2493 multidrug resistance protein cotranscribed with 

omr 

342- cephalosporin Cdeacetytase - 
2732 cytochrome P450-fike enzyme 
3603 cytochrome P45Wike enzyme 
960 vegetative catalase f 
4009 catalase 2 
3964 catalase 

51 dimethyiadenosine transferase (kasuoamvein 

resistance) 
3857 methylenomycin A resistance protein 
3532 ferutate decarboxylase 
2048 lactamase 

1859 hydroxylase of the polyketide produced by the 

pteduster ' 
2565 superoxide dismutase 
2103 superoxide dismutase 
4 188 tetracycline resistance leader peptide 
4212 thiophen and furan oxidation 
339 tuntcamycin resistance 
36 toxic cation resistance 
190 



250 erythromycin esterase 

229 6-iactamase 

276 viomyon phosphotransferase 

283 toxic catk)n resistance protein 

312 - 

312 
313 

314 

316 



yfiM 
yfnC 
ygaF 
ynjG 
yisY 

W • 

yjiC 
yk'A 
ykkB 
ykoY 
yndN 
yocD 
yojK 
yojM 
yokD 
yQcM 
yofP 
yrtrj 

yrpB 
Ml 
ytnj 
yubB 

yusl . 
yvbT 
yvtiP 
yv*cH 
ywnhi 
yxel 
yxeK 
yyaR 



521 



303 
473 
910 



IV.3 
pksB 
pksC 
pksD 
pksE 
pksF 
pksG 
pksH 
pksf 
pksJ 
pksK 
pksi 
pksM 
pksN 
pksP 
pksR 
ppsA 
ppsB 
ppsC 
ppsD 
ppsE 
sbo 
sfp 
srfAA 
srfAB 
sriAC 
srfAD 
SunA 

yomB 
yukL 
yukM 



tellurium resistance protein 
tellurium resistance protein 
tellurium resistance protein 
tellurium resistance protein 
toxic anion resistance protein 
457 lactam utilization protein 
496 manganese-containing catalase 
531 antibiotic resistance protein 
618 macrolidegfycosyltransferase 
732 acnflavin resistance protein 
790 salicylate 1 -monooxygenase 
792 cytochrome P450 / NAOPH-cytochrome P450 

reductase 
636 mtne -oxide synthase 
804 fosmidmycin resistance protein 
943 thioi-specific antioxidant protein 
H22 monooxygenase 
ii69 chloride peroxidase 

1291 monooxygenase 

1292 macrotioe glycosyftransferase 
1366 immunity 10 bacteriotoxins 
1375 A/-acetyitransferase 

1410 toxic anion resistance protein 
1916 fosfomycin resistance protein 
2088 immunity to bacteriotoxins 
2M7 macroiidegiycosyltransferase 
2H5 superoxide dismutase 
223^ aminoglycoside wr-acetyttransferase 
26=3 arsenate reductase 
2596 penicillin tolerance 

2776 cytochrome P450/NADPH-cytochromeP450 

reductase 
2736 2-nitropropane dioxygenase 
30i7. thiol peroxidase 
3002 nitriiotriacetate monooxygenase 
3195 bacitracin resistance protein (undecaprenol 

kinase) 
3366 arsenate reductase 
3437 alkanai monooxygenase 
3543 reticuiine oxidase 
3910 monooxygenase 
3760 phosphinothricinacetyftransferase 
4062 penicillin amidase 
406i monooxygenase 
41S5 streptothricine acetyl-transferase 

ANTIBIOTIC PRODUCTION... 



xkdC 
xkdO 
xkdE 
xkdF 
xkdG 
xkdH 
xkdl 
xkdJ 
xkdK 
xkdM 
xkdN 
xkdO 
xkdP 
xkdO 
xkdP. 
xkdS 
xkdT 
xkdU 
xkdV 
xkdW 
xkdX 
xkdY 
xtmA 
xlmB 
xxrA 
ycdD 
ydct 
ydcM 
ybgE 
yjbJ 
yjqB 
ymaC 
ymaH 
ymfD 
ymfE 
yndL 
yobO 
yokA 
yokL 
yolB 
yomA 
yomj 
yomP 
yomR 
yomS 
yoqD 
yoqZ 
yosO 
yqaB 
iVaJ 
ygaK 
yqaM 
yQaO 
yqaS 
yqaT 
yqbA 
yqbD 
yobE 
yQbH 

Wbl 

yQbJ 

WbK 

WbL 

yqbM 

yqbN 

yabO 

ypbP 

ytjbO 

VQbR 

ydbS 
yQbT 
yqcA 
yqcC 
yqcD 
yQcE 
yuxG 
yqxH 



1322 P8SX prophage 
'323 PSSXrxophSe 

^328 PBSXprc?ha5e 

1329 P8SX prophage 

1330 PBSxSShaS 

1331 PBSX prophage 

1331 PBSX prophage 

1332 P8SX prophage 

1333 PBSX prophage 

1334 PBSX prophage 
1334 PSSX prophage 

1338 PBSX prophage 

1339 PBSX prophage 

1340 P8SX prophage 

1340 PBSX prophage 

1341 PBSX prophage 

1342 PBSX prophage 

1343 PBSX prophage 
1345 PBSX prophage 
1345 PBSX prophage 
m Si*^^ e Wcexoenzyme 

?£ S^n^se (large subumt) 
1J24 PBSX prophage 

304 L^lancyJ-ivfiJutarnate peptidase 

530 tntegrase 

531 immunity region protein in prophage 
1090 phage infection protein 
1235 tytictransgVcosylase 
1318 r^ge^eJated replication protein 
1863 phage-retated protein 
1867 host factor-i protein 

1755 phage-related protein 

1756 phage-related protein 
19" Phage-retated replication prorein 

S oMmbtrSsT 1 ^ 9 ^ 396 ^^ 
2274 phage-related protein 
2272 phage-related protein 
2264 holin 

2248 phage-related immunity protein 
2243 phage-related protein 
2242 phage-related protein 
2241 phage-related lytic exoenzyrne 

2160 Phage-retated endodeoxyribonuclease 
2700 phage-related protein 
2696 phage-related protein 
2695 phage-related protein 
2694 phage-related protein 
2692 phage-related protein 

P^Oe-relatedterminase small subunit 
2689 Phage-related terminase large subunit 
2688 phage-related protein 
2684 phage-related protein 
2683 phage-related protein 
2682 phage-related protein 
2681 phagfrrelated protein 
2681 phage-related protein 
2680 phage-related protein 
2679 phage-related protein 
2679 phage-related protein 
2677 phage-related protein 
2677 phage-retated protein 
2672 phage-related protein 
2671 phage-related protein 
2670 phage-related protein 
2670 phage-related protein 
2670 phage-related protein 
2669 phage-related protein 
2668 phage-related protein 
2667 phage-related protein 
2666 phage-related protein 
2666 phage-related lytic exoenzyme 
2665 holin 



1782 involved m pofyketide synthesis 

1783 involved in polyketide synthesis 
1785 invoiveo m polyketide synthesis 
1785 involved in polyketide synthesis 
1783 involved in polyketide synthesis 

1789 invoiveo in polyketide synthesis 

1790 involved in pofyketide synthesis 

1791 involved in polyketide synthesis 

1792 invoiveo in polyketide synthesis 
1794 oolyketide synthase 

1808 polyketide synthase 
1821 poiyketiae synthase 

1834 polyketide synthase 

1835 polyketioe synthase 
1850 polyketce synthase 
1997 peptide synthetase 
1990 peptide synthetase 
1982 peptide synthetase 
1974 peptide synthetase 
1963 peptide synthetase 
3835 subtilosinA 

408 surf actin oroduction 

377 surfactin synthetase / competence 

387 surfactin synthetase / competence 

398 surfactin synthetase / competence 

402 surfactin synthetase / competence 

2269 subiancn 163 [antibiotic antimicrobial precursor 



2264 bacteriocin 

3282 antibiotic synthetase 

3283 antibiotic synthetase 

PHAGE-RELATED FUNCTIONS 

1637 integrase/recombinase 
2449 irteg/ase/recombinase 
1346 involved ir cell lysis uoon induction of PBSX 
1346 hydrolysis c 5-bromo 4-chtoroindolyi phosphate 
ucon mouOOP of PBSX (holin) 



IV5 
ydcP 
ydcQ 
ydcR 
ya'dB 
yddE 
ytidH 
yeiB 
yefC 
0 yneB 
yocA 

IV.6 

bex 

csbA 

csfB 

ctaG 

eag 

ecsC 

mmgE 

mfZ 

sapB 

sbc 

veg 

yact 

ybaL 

ycbU 

yerN 

yhdP 

yhdT 

yheG 

ypiO 

yqxC 

yrkA 

yrvO 

yuaG 

yurV 

yurW 

yvt! 



TRANSPOSON AND IS 

533 transposon protein 

533 transposon protein 

535 transposon protein 

537 transposon protein 

538 transposon protein 
544 transposon protein 

739 site-specific recombinase 

739 resotvase 

1918 resofvase 

2035 transposonnrelated protein 
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MISCELLANEOUS 

2610 GTP-binding protein 

3614 putative membrane protein 

36 o -transcribed gene 

1564 function unknown 

1430 small membrane protein 

1079- function unknown 

2509 function unknown 

3027 NifS protein homotogue 

726 mutant activates alkaline phosphatase during 

,, QC sporulattonindependenrjyof^ando 1 

1595 small basic protein 

53 function unknown 

102 creatine kinase ■■" 

157 ATP^ingMrjHike protein 

287 NifSrjroteinrKjmotoflue 

730 petll2^kepmtein 

1033 hemolysin . 

1035 hemolysin ' 

1049 calcium-binding protein 

2295 hemolysin III homotogue 

2523 hemofys in-like 

2720 hemofysirvJike 

2811 NifS protein homotogue 

3181 epidermal surface antigen 

3357 NifU protein homotogue 

3358 NifS protein homotogue 
3309 NifU protein homotogue 
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twice. Among the duplications, we identified as expected, the 
Tftosomal RNA genes and their flanking regions, but also regions 
'known to correspond to genes comprising long sequence repeats 
^(such as pks and srf). We also found several regions that were not 
. erpected: a 182-bp repetition within the yyaL and yyaO genes; a 4 10-bp 
; Repetition between the yxaKand yxal genes; an internal duplication 
■ of 174 bp inside ydcl\ and significant duplications in the regions 
''involved in the transcriptional control of several genes (such as 
;;il8bp repeated three times between yxbB and yxbQ. Finally, we 
? %ound several repetitions at the borders of regions that might be 
involved in bacteriophage integration. 

= | : The most prominent duplication was a 190-bp element that was 
^repeated 10 times, in the chromosome. Multiple alignment of the ten 
"iepeats showed that they could be classified into two subfamilies 
iyrith six and three copies each, plus a copy of what appears to be a 
-chimaera. Similar sequences have also been described in the closely 
'related species Bacillus licheniformis 21,22 . A striking feature of these 
Repeats is that they are only found in half of the chromosome, at 
either side of the origin of replication, with five repeats on each side. 
.".Furthermore, with the exception of the most distal repeat at 
position 737,062, they lie in the same orientation with respect to 
the movement of the replication fork (Figs 2 and 3). Putative 
secondary structures conserved by compensatory mutations, as 
Vyvell as an insert in three of the copies, suggest that this element 
•could indicate a structural RNA molecule. 

^Analysis at the transcription and translation level. Over 4,000 
putative protein coding sequences (CDSs) have been identified, 
■with an average size of 890 bp, covering 87% of the genome 
^sequence (Fig. 2). We found that 78% of the genes started with 
#TG, 13% with TTG and 9% with GTG, which compares with 85%, 
;3% ahd 14%, respectively, in £. coif. Fifteen genes (eight in the 
/predicted CDSs in bacteriophage SP/3) exhibiting unusual start 
■JC&jons (namely ATT and CTG) were also identified through their 



■<Table 1 Functional classification of the Bacillus subtilis protein-coding 

Me genes of known function or encoding products similar to known proteins in a 
$ubtilis or in other organisms have been classified into functional categories 
(2,379 genes). The total number of genes in each category is indicated after the 
,rcate<jbry title. Genes are listed in alphabetical order within each category, and 
hheir positions (in kilobases) on the B. subtilis chromosome are indicated after the 
/gene names. A brief description is given for each gene. In some cases, interacting 
'proteins have been indicated between brackets (for example, histidine kinases 
'^mf response regulator, phosphatases and their substrates). More detailed and 
^constantly updated information is available in the SubtiList database (see 
(Methods). A preliminary assessment of the significance of sequence similarities 
(was obtained through an automated procedure involving a combination between 
She BLAST2P probability and the percentage of amino-acid identity. Matches 
■'jponsidered significant were re-examined manually. It should be emphasized that 
jSjjj : [Kinctions assigned to y genes are based only on sequence similarity information 
^ ^Sh the best counterparts in protein databanks. Genes whose products are only 
ifijjiiiarto other unknown proteins, or not significantly similar to any other proteins 
jfj databanks (categories V and VI), were omitted. 

life-,. 

.2. General view of the B. subtilis chromosome. Arrows indicate the 
^fentation of transcription. Genes are coloured according to their classification 
' Jl^brbad functional. categories (blue, category I; green, category II; red, 
Ipry ill; orange, category IV; purple, category V; pink, category VI; see Table 
f^^s 2 CDSs according to codon usage analysis are indicated by oblique 
w|^;,and class 3 CDSs are indicated by vertical hatches. Ribosomal RNA 
^re^e coloured in yellow. Transfer RNA genes are marked by triangles. Other 
^jjfenes.are represented as white arrows. Known genes (non-y genes) are 
^■(fti>pid type* Putative transcription termination sites are represented as 
^pvw'prophages and prophage-like elements are indicated by brown 
^^the chromosome line. The 190-bp element repeated ten times is 
* J Jtjy hatched boxes. " . 
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similarities to known genes in other organisms or because they had a 
good GeneMark prediction (see Methods). This has not yet been 
substantiated experimentally. However, in the case of the gene 
coding for translation initiation factor 3, the similarity with its E. 
coli counterpart strongly suggests that the initiation codon is ATT, 
as is the case in £ coli 

We have not annotated CDSs that largely or entirely overlap 
existing genes, although such genes (for example, comS inside 
srfAA) certainly exist. It is also likely that some of the short CDSs 
present in the B. subtilis genome have been overlooked. For these 
reasons and possible sequencing errors, the estimated number of B. 
subtilis CDSs will fluctuate around the present figure of 4,100. 

In several cases, in-frame termination codons or frameshifts were 
confirmed to be present on the chromosome (for example, an 
internal termination codon in ywtfF, or the known programmed 
translational frameshift in pr/B), indicating that the genes are either 
non-functional, (pseudogenes) or subject to regulatory processes. It 
will therefore be of interest to determine whether these gene features 
are conserved in related Bacillus species, especially as strain 168 is 
derived from the Marburg strain that was subjected to X-ray 
irradiation". % 

A few regions cTcTnbthave any identifiable feature indicating that 
they are transcribed: they could be 'grey holes* of the type described 
in E. coli 24 . Preliminary studies involving all regions of more than 
400 bp without annotated CDSs indicated that, of -300 such 
regions, only 15% were likely to be really devoid of protein- 
coding sequences. One of the longest such regions, located between 
yfiO and yf)N, is 1,628 bp long. Grey holes seem generally to be 
clustered near the terminus of replication. However, a grey-hole 
cluster located at -600 kb might be related to the temporary 
chromosome partition observed during the first stages of sporula- 
tion, when a segment of about one-third of the chromosome enters 
the prespore, and remains the sole part of the chromosome in the 
prespore for a significant transition period 23 . 

The codon usage of B. subtilis CDSs was analysed using factorial 
correspondence analysis 17 . We found that the CDSs of B. subtilis 
could be separated into three well-defined classes (Fig. 4). Class 1 
comprises the majority. pf the B. subtilis genes (3,375 CDSs), 
including most of the genes involved in sporulation. Class 2 (188 
CDSs) includes genes that are highly expressed under exponential 
growth conditions, such as genes encoding the transcription and 
translation machineries, core intermediary metabolism, stress pro- 
teins, and one-third of genes of unknown function. Class 3 (537 
CDSs) contains a very high proportion of genes of unidentified 
function (84%), and the members of this class have codons enriched 
in A + T residues. These genes are usually clustered into groups 
between 15 and 160 genes (for example, bacteriophage SP/3) and 
correspond to the A + T-rich islands described above (Fig. 1). 
When they are of known function, or when their products display 
similarity to proteins of known function, they usually correspond to 
functions found in, or associated with, bacteriophages or trans- 
posons, as well as functions related to the cell envelope. This 
includes the region ydc/ydd/yde (40 genes that are missing in 
some B. subtilis strains 26 ), where gene products showing similarities 
to bacteriophage and transposon proteins are intertwined. Many of 
these genes are associated with virulence genes identified in patho- 
genic Gram-positive bacteria, suggesting that such virulence factors 
are transmitted horizontally among bacteria at a much higher 
frequency than previously thought. If we include these A + T-rich 
regions as possible cryptic phages, together with known bacterio- 
phages or bacteriophage-like elements (SPft PBSX and the skin 
element), we find that the genome of B. subtilis 168 contains at least 
10 such elements (Figs 2 and 3). Annotation of the corresponding 
regions often reveals the presence of genes that are similar to 
bacteriophage lytic enzymes, perhaps accounting for the observa- 
tion that B. subtilis cultures are extremely prone to lysis. 
The ribosomal RNA genes have been previously identified and 
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shown to be organized into ten rRNA operons, mainly clustered 
around the origin of replication of the chromosome (Figs 2 and 3). 
In addition to the 84 previously identified tRNA genes, by using the 
Palingol 27 and tRNAscan 28 programs, we propose four putative new 
tRNA loci (at 1,262 kb, 1 ,945 kb, 2,003 kb and 2,899 kb), specific for 
lysine, proline and arginine (UUU, GGG, CCU and UCU antic- 
odons, respectively). The 10S RNA involved in degradation of 
proteins made from truncated mRNA has been identified (ssrA), 
as well as the RNA component of RNase P (rnpB) and the 4.5S RNA 
involved in the secretion apparatus (scr). 

There is a strong transcription orientation bias with respect to the 
movement of the replication fork: 75% of the predicted genes are 
transcribed in the direction of replication. Plotting the density of 
coding nucleotides in each strand along the chromosome readily 
identifies the replication origin and terminus (Fig. 3). To identify 
putative operons, we followed ref. 29 for describing Rho- 
independent transcription termination sites. This yielded —1,630 
putative terminators (340 of which were bidirectional). We retained 
only those that were located less than 100 bp downstream of a gene, 
or that were considered by the program to be 'very strong* (in order 
to account for possible erroneous CDSs). This yielded a total of 
— 1,250 terminators, with a mean operon size of three genes. A 
similar approach to the identification of promoters is problema- 
tical, especially because at least 14 sigma factors, recognizing 
different promoter sequences, have been identified in B. subtilis. 
Nevertheless, the consensus of the main vegetative sigma factor (a*) 
appears to be identical to its counterpart in E. coli (a 70 ): 5'- 
TTG ACA- n x 7 -TATAAT-3 ' . Relaxing the constraints of the similarity 
to sigma-specific consensus sequences ied to an extremely high 
number of false-positive results, suggesting that the consensus- 
oriented approach to the identification of promoters should be 
replaced by another approach 17 . 

Classification of gene products 

Genes were classified according to ref. 14, based on the representa- 
tion of cells as Turing machines in which one distinguishes between 
the machine and the program (Table 1). Using the BLAST2P 
software running against a composite protein databank compound 
of SWISS-PROT (release 34), TREMBL (release 3, update 1) and B. 



subtilis proteins, we assigned at least one significant couiV 
with a known function to 58% of the B. subtilis proteins. Thus* 
to 42% of the gene products, the function cannot be predicf 
similarity to proteins of known function: 4% of the prot ; 
similar only to other unknown proteins of B. subtilis] 12 
similar to unknown proteins from some other organism; ancl 
of the proteins are not significantly similar to any other prot^ 
databanks. This preliminary analysis should be interpreted * 
caution, because only -1,200 gene functions (30%) have? 
experimentally identified in B. subtilis. We used the y prefix m " 
names to emphasize that the function has not been ascertai- 
(2,853 y genes, representing 70%). 

Regulatory systems. Transcription regulatory proteins. H. 
turn-helix proteins form a large family of regulatory protr 
found in both prokaryotes and eukaryotes. There are several das 
including repressors, activators and sigma factors. Using BU 
searches, we constructed consensus matrices for helix-turn-hc 
proteins to analyse the B. subtilis protein library. We identified^ 
sigma or sigma-like factors, of which nine (including a new one)"jL 
of the SigA type. We also putatively identified 20 regulators (amo| 
which 18 were products of y genes) of the GntR family, f 
regulators (15 y genes) of the LysR family, and 12 regulators^ 
y genes) of the Lad family. Other transcription regulatory protei 
were of the AraC family (11 members, 10 '/), the Lrp family'-* 
members, 3 '/), the DeoR family (6 members, 3 y ), or addition 
families (such as the MarR, ArsR or TetR families). A puzzling™ 
observation is that several regulatory proteins display signified \ 
similarity to aminotransferases (seven such enzymes have beef ' 
identified as showing similarity to repressors). : :.$k 
Two-component signal-transduction pathways. Two-compone^ 
regulatory systems, consisting of a sensor protein kinase and|? 
response regulator, are widespread among prokaryotes. We hay 
identified 34 genes encoding response regulators in B. subtilis, mo 
of which have adjacent genes encoding histidine kinases. Resporjl 
regulators possess a well-conserved N-terminal phospho-accep^ 
domain 30 , whereas their C-terminal DNA-binding domains stiajL 
similarities with previously identified response regulators in £ coM 
Rhizobium meliloti, Klebsiella pneumoniae or Staphylococcus au'rafa 
Representatives of the four subfamilies recently identified in E. coti 



rmB ' 




Figure 3 Density of coding nucleotides along the 
B. subti/is chromosome. Yellow stands for the 
density of coding nucleotides in both strands of 
the sequence; red indicates the density of coding 
nucleotides in the clockwise strand (nucleotides 
involved in genes transcribed in the clockwise 
orientation). The movement of the repftcation 
forks is represented by arrows. Ribosoma! RNA 
operons are indicated by brown boxes. Known 
prophages and prophage-like .elements are 
represented as blue lines. The 190-bp eEement 
repeated ten times is represented by green lines. 




flgOre 4 Factorial correspondence analysis of codon usage in the B. subtilis 
* fpDSs. Red dots, genes from class 1; green triangles, genes from class 2; blue 
grosses, .genes from class 3. Class 2 contains genes coding for the translation 
^transcription machineries, and genes of the core intermediary metabolism. 
-Cf8S$ 3 genes correspond to codons strongly enriched in A or T in the wobble 
ipSmh; they generally belong to prophage-like inserts in the genome. 

'tQSpR, FixJ, CitB and LytR) have been identified in B. subtilis. In a 
'{^subfamily, CheY, the DNA-binding domain is absent. The 
;3D^rbinding domain of a single B. subtilis response regulator, 
$BN, 'shares similarity with regulatory proteins of the AraC family. 
iQuonim sensing. The B. subtilis genome contains 11 aspartate 
atase genes, whose products are involved in dephosphoryla- 
i&jhpf response regulators, that do not seem to have counterparts in 
^lGri£m-iiegative bacteria such as E. colt. Downstream from the 
tonfeponding genes are some small genes, called phr> encoding 
Regulatory peptides that may serve as quorum sensors 32 . Seven phr 
*$nes.have been identified so far, including three new genes (phrG, 
ffi&dphrK). 



72 (2%) 
112 (3%) 



Protein secretion. It is known that B. subtilis and related Bacillus 
species, in particular B. licheniformis and B. amyloliquefaciens, have 
a high capacity to secrete proteins into the culture medium. Several 
genes encoding proteins of the major secretion pathway have been 
identified: secA, secD, secE, secF, secY, ffh and ftsY. Surprisingly, there 
is no gene for the SecB chaperone. It is thought that other 
chaperone(s) and targeting factor(s), such as Ffh and FtsY, may 
take over the SecB function. Further, although there is only one such 
gene in E. coli, five type I signal peptidase genes (sipS, sipT t sipU, sipV 
and sipW) have been found 33 . The Isp gene, encoding a type II signal 
peptidase required for processing of lipo-modified precursors, was 
also identified. PrsA, located at the outer side of the membrane, is 
important for the refolding of several mature proteins after their 
translocation through the membrane. 

Other families of proteins. ABC transporters were the most 
frequent class of proteins found in B. subtilis. They must be 
extremely important in Gram-positive bacteria, because they have 
an envelope comprising a single membrane. ABC transporters will 
therefore allow such bacteria to escape the toxic action of many 
compoundsfWe propose that 77 such transporters are encoded in 
the genome. lrfgerifecal Jihey involve the interaction of at least three 
gene products, specified by genes organized into an operon. Other 
families comprised 47 transport proteins similar to facilitators (and 
perhaps sometimes part of the ABC transport systems), 18 amino- 
acid permeases (probably antiporters), and at least 16 sugar trans- 
porters belonging to the PEP-deperident phosphotransferase 
system. 

General stress proteins are important for the survival of bacteria 
under a variety of environmental conditions. We identified 43 
temperature-shock and general stress proteins displaying strong 
similarity to JE. coli counterparts. 

Missing genes. Histone-like proteins such as HU and H-NS. have 
been identified in E. coll We found that B. subtilis encodes two 
putative histone-like proteins that show similarity to E. coli HU, 
namely HBsu and YonN, but found no homologue to H-NS. It is 
known that the hbs gene encoding HBsu is essential, but we do not 
expect the yo/iNgene to,b>e essential because it is present in the SP/? 
prophage. IHF is similar to HU, and it is not known whether HBsu 
plays a similar role to that of IHF in E. coli Similarly, no protein 
similar to FIS could be found. 

Genes encoding products that interact with methylated DNA, 
such as seqA in £. coli, involved in the regulation of replication 
initiation timing, or mutH, the endonuclease recognizing the newly 
synthesized strand during mismatch repair at hemi -methylated 
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aralogue distribution in the genome of 8. subtilis. EachR subtilis 
'compared with all other proteins in the genome, using s Smith 
^orith'm. The baseline is established by making a similar 

ifcW^bVEMBER 1997 



comparison using 100 independent random shuffles of the protein sequence 
(Z-score > 13). 
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GATC sites, are also missing. This is in line with the absence of 
known methylation in B. subtilis, equivalent to Dam methylation in 
£ colt Similarly, E. coli sfiA, encoding an inhibitor of FtsZ action in 
t •.• r " ponse ' has no counterpart in B. subtilis. In contrast, B 
subtilis replication initiation-specific genes, such as dnaB and dnaD 
are missing in £ coli. The exact counterpart of the £ coli mukBgene, 
involved in chromosome partitioning, does not exist in B. subtilis 
but genes spoOf and smc (Smc is weakly similar to MukB), which are 
suggested to be involved in partitioning of the B. subtilis chromo- 
some, are missing in £ coli. 

Turnover of mRNA is controlled in £ coli by a 'degradosome' 
comprising RNase E. It has a counterpart in B. subtilis, but we failed 
to find a clear homologue of RNase E in this organism. Whether this 
is re ated to the role of ribosomal protein SI as an RNA helicase 
involved in mRNA turnover in £ coli requires further investigation 
In particular, a homologue of rpsA (Si structural gene), yp/D, might 
be involved in a structure homologous to the degradosome" 
Structurally unrelated genes of similar function. Several genes 
encode products that have similar functions in £ coli and B. subtilis 
but have no evident common structure. This is the case for the 
helicase loader genes, £ coli dnaC and B. subtilis dnal; the genes 
coding for the replication termination protein, £ coli tus and B 
subtilis rtp; and the division topology specifier genes, £ coli minE 
and B. subtilis divIVA. The situation may even be more complex in 
multisubumt enzymes: B. subtilis synthesizes two DNA polymerase 
J t.f a ^\ 0ne having 3 ' _5 ' Proofreading exonudease activity 
(FolC) and the other without the exonuclease activity (DnaE)- in E 
coh, only the latter exists. £ coli DNA polymerase II is structurally 
related to DNA polymerasea of eukaryotes, whereas B. subtilisYshC 
is related to DNA polymerase B. 



from within the SPB genome. In this latter cas^Sl 
sponding to the large subunit both contains an intr 0n arFdl*^ 

this enzyme also contains an intron, encoding an endonudeSP 
was found for the homologue in bacteriophage T4 w$gf 
By similarity with genes from other organisms, mere "^M 
be in addition to genes involved in amino-acid degradatkuff 
as Ae rocoperon which degrades argmine and related arnkXcL 
a large number of genes involved in the degradation of molefcl 
such as opines and related molecules, derivfd from p?L™W 
also , n line with the fact that B. subtilis degrades polygalacturol 
and suggests that, m its biotope, it forms specific relation?^ 

Secondary metabolism. In addition to many genes coding! 
degradat.ve enzymes, almost 4% of the B. subtilis genome codl 
for large multifunctional enzymes (for example, the stf ppTanP 
loci), similar to those involved in thesynthesis of antibiotics in of 
genera of Gram-positive bacteria such as Streptomyces Natui 
isolates of B. subtilis produce compounds with antibiotic acS 
such as surfactm, fengycin and difficidin, that can be related to t 
above-mentioned loci. This bacterium therefore provides a sini 
and genetically amenable model in which to study the synthesis 

vet InnL" ' tS £ gU,ati ° n - 7 hCSe patWa r S are ° ften °Snized 
very long operons (for example, the pks region spans 78.5 kb, abo 5 

1° the 8 enome )-.The corresponding sequences are most 
located near the terminus of replication, together with prophal 
and prophage-hke sequences. 6 



Metabolism of small molecules 

The type and range of metabolism used for the interconversion of 
low-molecular-weight compounds provide important clues to an 
organisms natural environment(s) and its biological activity Here 
we briefly outline the main metabolic pathways of B. subtilis before 
the reconstruction of these pathways in silica, the correlation of 
genes with specific steps in the pathway, and ultimately the predic- 
tion of patterns of gene expression. 

Intermediary metabolism. It has long been known that B subtilis 
can use a variety of carbohydrates. As expected, it encodes an 
Embden-Meyerhof-Parnas glycolytic pathway, coupled to a func- 
tional tricarboxylic acid cycle. Further, B. subtilis is also able to grow 
anaerobically in the presence of nitrate as an electron acceptor. This 
metabolism is, at least in part, regulated by the FNR protein, 
binding to sites upstream of at least eight genes (four sites experi- 
mentally confirmed and four putative^eskA noteworthy feature 
of B. subtilis metabolism is an apparent reqWment of branched 
short-chain carboxylic acids for lipid biosynthesis". Branched- 
chain 2-keto acid decarboxylase activity exists and may be linked 
to a variety of genes, suggesting that B. subtilis can synthesize, and 
utilize linear branched short-chain carboxylic acids and alcohols 
Amino-acid and nucleotide metaboUsm. Pyrimidine metabolism 
ot B. subtilis seems to be regulated in a way fundamentally different 
from that of £ coli, as it has two carbamylphosphate synthetases 
(one specific for arginine synthesis, the other for pyrimidine) 
Additionally, the aspartate transcarbamylase of B. subtilis does not 
act as an allosteric regulator as it does in £ coli. As in other 
microorganisms, pyrimidine deoxyribonucleotides are synthesized 
from nbonucleoside diphosphates, not triphosphates. The cytidine 
diphosphate required for DNA synthesis is derived from either the 
salvage pathway of mRNA turnover or from the synthesis of 
phospholipids and components of the cell wall. This means that 
polynucleotide phosphorylase is of fundamental importance in 
nucleic acid metabolism, and may account for its important role 
xt c ° m P etence • Two ribonucleoside reductases, both of class I, 
tJT*' are encoded by the B. subtilis chromosome, in one case 
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Paralogues and orthologues 

It is important to relate intermediary metabolism to genori 
structure, function and evolution. We therefore compared theli 
subtilis protems with themselves, as well as with proteins fro! 
known complete genomes, using a consistent statistical method uf 
allows the evaluation of unbiased probabilities of similarifi 
between proteins™. For Z-scores higher than 13, the numb, " 
proteins similar to each given protein does not vary, indicating U 
stmilar" ' ° f P roteins that ™ signified 

Families of paralogues. Many of the paralogues constitute lal 
fam.l.es of functionally related proteins, involved in the transport! 
compounds into and out of the cell, or involved in transcriptil 
regulafon. Another part of the genome consists of gene doul4 
(568 genes) triplets (273 genes), quadruplets (168 genes) anl 
quintuplets (100 genes). Finally, about half of the genome b n3 
of genes coding for proteins with no apparent paralogues (Fig. 5)1 
No arge family comprises only proteins without any Similarity t 
proteins of known function. M 

The process by which paralogues are generated is not 
understood, but we might find clues by studying some of tl' 
duplications in the genome; Several approximate DNA repetition! 
associated with very high levels of protein identity, were fourik 
mainly within regions putatively or previously identified as prl 
phages. This is in line with previous observations about PBSX a$ 
the skin element 5 '^, and suggests that these prophage-like element 
share a common ancestor and have diverged relatively recently.^ 
addition, several protein duplications are in genes that are local* 
very close to each other, such as yukL and dhbF (the correspond! 
proteins are 65% identical in an overlap of 580 amino ,adds),$ 
and yugK (protems 73<>/o identical), yxjG and ^.(proteins 7<k 
identical), and the entire opuB operon, which is duplicated 3ll 
away (opuC operon, yielding -80% of amino-acid identity in tS 
corresponding proteins). •$ 
The study of paralogues showed that, as in other genomes, a fl 
dasses of genes have been highly expanded. This argues against l " 

Incel gen0m \ evolvin g th ™gh a series of duplications; . 
ancestral genomes, but rather for the idea of genes as li4 
orgamsms, subject to evolutionary constraints, some being suK 
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Slutted to expansion and natural selection, and others to local 
^plications of DNA regions. 

5;' Among paralogue doublets, some were unexpected, such as the 
%Kt aminoacyl tRNA synthetases doublets (hisS (2,817 kb) and 
1^(3,588 kb); thrS (2,960 kb) and ffcrZ (3,855 kb); tyrS (3,036 kb) 
gad tyrZ{3,945 kb)) or the two mutS paralogues {mutS and yshD). 
'^fbis latter situation is similar to that found in Synechocystis. In the 
of B. subtilis, the presence of two MutS proteins could indicate 
i^it there are two different pathways for long-patch mismatch 
Miair t possibly a consequence of the active genetic transformation 
^tohism of B. subtilis. 

p^tiilies of orthologues. Because Mycoplasma spp. are thought to 
'iierived from Gram-positive bacteria similar to B. subtilis, we 
"pDmpared the B. subtilis genome with that of Af. genitalium. Among 
%t 450 genes encoded by M. genitalium, ,the products of 300 are 
'tfmilar to proteins of B. subtilis. Among the 146 remaining gene 
products, a further 3 are similar to proteins of other Bacillus species, 
'tnd 9 to proteins of other Gram-positive bacteria; 25 are similar to 
•proteins of Gram-negative bacteria; and 19 are similar to proteins of 
other Mycoplasma spp. This leaves only 90 genes that would be 
P "specific to M. genitalium and might be involved in the interaction of 
:fhi$ organism with its host. 

: .f>Vrhe B. subtilis genome is similar in size to that of E. coll Because 
\these bacteria probably diverged more than one billion years ago, it 
; 6 of evolutionary value to investigate their relative similarity. About 
'% jliXK) B. subtilis genes have clear orthologous counterparts in E. coli 
I C(?i c : < J uarter °f tne genome). These genes did not belong either to 
| rie^prophage-like regions or to regions coding for secondary 
Setabolism (—15% of the B. subtilis genome). This indicates that 
ilaYge fraction of these genomes shared similar functions. At first 
p Sight, however, it seems that little of the operon structure has been 
| ;Wnserved. We nevertheless found that —100 putative operons or 
|| jtffts i of operons were conserved between E. coli and JB. subtilis. 
!^7imqng these, —12 exhibited a reshuffled gene order (typically, the 
i^tfifcinose operon is araABD in B. subtilis and araBAD in E. coli). In 
Addition to the core of the translation and transcription machinery, 
ll^^identified other classes of operons that were well conserved 
jg? ^Wween the two organisms, including major integrated functions 
•mhas ATP synthesis {atp operon) and electron transfer (eta and 
^>|ta operons). As well as being well preserved, the murein bio- 
^l^fnihetic region was partly duplicated, allowing creation of part of 
^|fl£;genes required for the sporulation division machinery 41 . The 
jiMruno-acid biosynthesis genes differ more in their organization: the 
|;^ xoIi genes for arginine biosynthesis are spread throughout 
^fcc' chromosome, whereas the arginine biosynthesis genes of 
subtilis form an operon. The same is true for purine biosynthetic 
B ifres*, Genes responsible for the biosynthesis of coenzymes and 
thetic groups in B. subtilis are often clustered in operons that 
from those found in E. coli. Finally, several operons conserved 
tj£. coZf. and B. subtilis correspond to unknown functions, and 
^djtherefore be priority targets for the functional analysis of 
^mbdel genomes. 
w Comparison with Synechocystis PCC6803 revealed about 800 
^thologues. However, in this case the putative operon structure 
J^tremely poorly conserved, apart from four of the ribosomal 
vteih ojperons, the groES-groEL operon, yfnHG (respectively in 
^ocystis' rfbFG), rpsB-tsf, ylxS-nusA-infB, asd-dapGA-ymfA, 
fflyS efp-accBy grpE-dnaK, yurXW. The nine-gene atp operon of 
ffiftitis is split into two parts in Synechocystis: atpBE and 



"■ion. . 

-~-Ms^' physiology and molecular biology of B. subtilis 
J^^ensively studied over the past 40 years. In particular, B. 
|[Sa$ ; l^eri vised to study postexponential phase phenomena 
^^l^tipn ind competence for DNA uptake. The genome 
5 of £i-coJi and B. subtilis provide a means of studying the 

aiS^^--/ ' • 
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evolutionary divergence, one billion years ago, of eubacteria into the 
Gram-positive and Gram-negative groups. The availability of 
powernil genetic tools will allow the B. subtilis genome sequence 
data to be exploited fully within the framework of a systematic 
functional analysis program, undertaken by a consortium of 19 
European and 7 Japanese laboratories coordinated by S. D. Ehrlich 
(INRA, Jouy-en-Josas, France) and by N. Ogasawara and H. 
Yoshikawa (Nara Institute of Science and Technology, Nara, 
Japan). □ 

Methods 

Genome cloning and sequencing. An international consortium was 
established to sequence the genome of B. subtilis strain 168 (refs 9, 10, 42). 
At its peak, 25 European, seven Japanese and one Korean laboratory partici- 
pated in the program, together with two biotechnology companies. Five 
contiguous DNA regions totalling 0.94 Mb, and two additional regions of 
0.28 and 0. 1 4 Mb, were sequenced by thejapanese partners, while the European 
partners sequenced a total of 2.68 Mb. A few sequences from strain" 168 
published previously were not resequenced when long overlaps did not indicate 
differences. 

A major technical difficulty was the inability to construct in E. coli gene 
banks representative^&ab^cetlre B. subtilis chromosome using vectors that 
have proved efficient for other sources of bacterial DNA (such as bacteriophage 
or cosmid vectors). This was due to the generally very high level of expression of 
B. subtilis genes in E. coli, leading to toxic effects. This limitation was overcome 
by: cloning into a variety of vectors 90 '* 4 ; using an E. coli strain maintaining low- 
copy number plasm ids 44 ; using an integrative plasmid/marker rescue genome- 
walking strategy* 4 ; and in vitro amplification using polymerase chain reaction 
(PCR) techniques 45 ' 46 . 

Although cloning vectors were used in the early stages as templates for 
sequencing reactions, they were largely superseded in the later stages by long- 
range and inverse PCR techniques. To reduce sequencing errors resulting from 
PCR amplification artefacts, at least eight amplification reactions . were 
performed independendy and subsequently pooled. The various sequencing 
groups were free to choose their own strategy, except that all DNA sequences 
had to be determined entirely on both strands. 

Sequence annotation and verification. The sequences were annotated by the 
groups, and sent to a centra] depository at the Institut Pasteur 14 . The Japanese 
sequences were also sent there through the Japanese depository at the Nara 
Institute of Science and Technology. The same procedures were used to identify 
CDSs and to detect frameshifts. They were embedded within a cooperative 
computer environment dedicated to automatic sequence annotation and 
analysis 39 . In a first step, we identified in all six possible frames the open 
reading frames (ORFs) that were at least 100 codons in length. In a second step, 
three independent methods were used: the first method used the GeneMark 
coding-sequence prediction method 47 together with the search for CDSs 
preceded by typical translation initiation signals (5'-AAGGAGGTG-3'), 
located 4-13 bases upstream of the putative start codons (ATG, TTG or 
GTG); the second method used the results of a BLAST2X analysis performed on 
the entire B. subtilis genome against the non- redundant protein databank at the 
NCBI; and the third method was based on the distribution of non -overlapping 
trinucleotides or hexa nucleotides in the three frames of an ORF 48 . 

In general, frameshifts and missense mutations generating termination 
codons or eliminating start codons are relatively easy to detect. We shall devise a 
procedure for detecting another type of error, GC instead of CG or vice versa, 
which are much more difficult to identify. It should be noted that putative 
frameshift errors should not be corrected automatically. The sequences of the 
flanking regions of a 500-bp fragment centred around a putative error were sent 
to an independent verification group, which performed PCR amplifications 
using chromosomal DNA as template, and sequenced the corresponding DNA 
products. 

Organization and accessibility of data. The B. subtilis sequence data have 
been combined with data from other sources (biochemical, physiological and 
genetic) in a specialized database, SubtiList 49 , available as a Macintosh or 
Windows stand-alone application (4th Dimension runtime) by anonymous 
FTP at ftpy/ftp.pasteur.fr/pub/GenomeDB/SubtiList SubtiList is also accessible 
through a World-Wide Web server at http://www.pasteur.fr/Bio/SubtiListhtml, 
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OU Streptococcus Pyogenes Sequence Blast 
Server Results 



TBLASTN 1.3.9 [29-Oct-93] 

Reference: Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, 
and David J. Lipman (1990). Basic local alignment search tool. J. Mol . Biol. 
215:403-410. 

Notice: statistical significance is estimated under the assumption that the 
equivalent of one complete reading frame of the database codes for protein and 
that significant alignments will involve only coding reading frames. 
Query= mrwypwlrpd feklvasyqa grghhalliq alpgmgddal iyalsryllc qqpqghkscg 

(274 letters) 
Database: /strep/abi/spphrap/auto_strep 

139 sequences; 1,816,476 total letters. 
Searching done 

Smallest 
Poisson 





Reading 


High 


Probability 


Sequences producing High-scoring Segment Pairs: 


Frame 


Score 


P(N) N 


Contig218 


+3 


122 


3.3e-10 1 


Contig203 


+1 


100 


4.0e-07 1 


Contig215 


+3 


49 


0.95 2 


Contigl73 


-1 


42 


0.99 4 



>Contig218 

Length = 36,214 
Plus Strand HSPs : 
Score = 122 (56.3 bits), Expect = 3.3e-10, ,P = 3.3e-10 
Identities = 31/97 (31%), Positives = 47/97 (48%), Frame = +3 
Query: 2 CRGCQLMQAGTHPDYYTLAPEKGKNTLGVDAVl^EVTEKLNEHARLGGAKVvW'TDAALLT 61 

C C + + T+ + + GVD +R++ +K KV + + +L+ 

Sbjct: 33567 CNQCDICRDITNGSLEDVIEIDAASNNGVDEIRDIRDKSTYAPSRATYKVYIIDEVHMLS 33746 
Query: 62 DAAANALLKTLEEPPAETWFFLATREPERLLATLRSR 98 

A NALLKTLEEP F LAT E ++ AT+ SR 

Sbjct: 33747 TGAFNALLKTLEEPTENWFILATTELHKIPATILSR 33857 
>Contig203 

Length = 23,545 
Plus Strand HSPs : 
Score = 100 (46.1 bits), Expect = 4.0e-07, P = 4.0e-07 
Identities = 22/71 (30%), Positives = 39/71 (54%), Frame = +1 
Query: 31 DAWEVTEKLNEHARLGGAKVVTAJWDAALLTDAAANALLKTLEEPPAETWFFLATREPER 90 

D V+E+ ++ +V + D + AAN+LLK +EEP E + FL T + + 

Sbjct: 18139 DWKEMMANFSQTGYENKRQVFIIKDCDKMHINAANSLLKYIEEPQGEAYIFLLTNDDNK 18318 
Query: 91 LLATLRSRCRL 101 

+L T++SR ++ 
Sbjct: 18319 VLPTIKSRTQV 18351 
Score = 58 (26.8 bits), Expect = 0.00034, Poisson P(2) = 0.00034 
Identities = 10/21 (47%), Positives = 12/21 (57%), Frame = +1 
Query: 1 HCRGCQLMQAGTHPDYYTLAP 21 

HCR CQL+ + G D LP 
Sbjct: 18055 HCRSCQLIEQGDFADVTVLEP 18117 
>Contig215 

Length = 27,361 
Plus Strand HSPs: 
Score = 49 (22.6 bits), Expect = 5.8, P = 1.0 
Identities = 11/30 (36%), Positives = 19/30 (63%), Frame = +3 
Query: 168 DWYSLLAALNHEQAPARLHWLATLLMDALK 197 



Following those BLAST hits is the sequence of the contig containing the top 
hit. 



TBLASTN 2 . 0al9MP-WashU [ 14- Jul-1998 ] [Build linux-x86 18:51:45 30-Jul-1998] 
Reference: Gish, Warren (1994-1997). unpublished. 

Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. 
Lipman (1990). Basic local alignment search tool. J. Mol . Biol. 215:403-10. 

Notice: statistical significance is estimated under the assumption that the 
equivalent of one complete reading frame of the database codes for protein and 
that significant alignments will involve only coding reading frames. 



Query= delta prime 

(334 letters) 



Database : /usr/local/db/e_f aecalis 

293 sequences; 3,209,119 total letters. 
Searching 10 20 30 40 50 60 70 80 90 100% done 



Sequences producing 

6277 
'6250 



High-scoring Segment 



Reading 
Pairs : Frame 

-1 
-2 



Smallest 
Sum 

High Probability 
Score P(N) N 

210 9.6e-16 1 
162 2.9e-10 1 



>6277 

Length = 9336 



Minus Strand HSPs : 



Score = 210 (73.9 bits), Expect = 9.6e-16, P = 9.6e-16 
Identities = 62/218 (28%), Positives = 105/218 (48%), Frame = -1 



Query: 11 FEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQA 70 

+++L S++ GR HA L + G G +++++ C + C C C + 

Sbjct: 8865 YKQLQKSFEHGRLAHAYLFEGDTGTGKQEFGLWMAKHVFCTNLVNQQPCNECHNCVRINE 8686 

Query : 7 1 GTHPDYYTLAPEKGKNTLGVI)AVREVTEK^ 13 0 

HPD +AP+ G+ T+ V+ +RE+ + ++ KV + +A ++ AAN+LLK 

Sbjct: 8685 NEHPDVLRIAPD-GQ-TIKVNQIRELKAEFSKSGVETAKKVFLIQEADKMSTGAANSLLK 8512 

Query: 131 TLEEPPAETWFFLATREPERLLATLRSRCR-LHYLAGPPEQYAVTWLSREVTMSQDALLA 189 

LEEP + L T R+L T++SRC+ LH+ + + + + LLA 

Sbjct: 8511 FLEEPEGQILAILETTSLSRILPTIQSRCQTLHFQPLVKKTLIDRLIKQGIGEKTATLLA 8332 

Query: 190 ALRLSAGSPGAALALFQGDNW--QARETLCQALAYSVPSGD 228 

L S A+ + Q D W +ARE + Q Y + S D 

Sbjct: 8331 EL TNSFEKAVEISQ-DEWFNEAREIILQWFNY-LKSND 8224 



>6250 

Length = 24,587 



Minus Strand HSPs: 



Score 


= 162 


(57.0 bits), Expect = 2.96-10, P = 2.9e-10 




Identities = 


= 41/134 (30%), Positives = 62/134 (46%), Frame = -2 




Query : 


25 


HALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKG 


84 






HAL G G + ++ + C+Q+CCC + G D + + 




Sbjct: 


5419 


HAYLFTGPRGTGKTSAAKIFAKAINCKHSQDGEPCNVCETCVAITEGRLNDVIEI--DAA 


5246 


Query : 


85 


KNTLGVDAVREVTEKLNEHARLGGAKVVWVTDAALLTDAAANALLKTLEEPPAETWFFLA 


144 






N GV+ +R++ +K KV + + +L+ A NALLKTLEEPP F LA 




Sbjct : 


5245 


SNN- GVE E I RD I RDKAK YAPTQ AE YKVY 1 1 DE VHML S TG AFNA LLKT L E E P PQNV I F I LA 


5069 


Query : 


145 


TREPERLLATLRSR 158 








T EP ++ T+ SR 




Sbjct: 


5068 


TTEPHKIPLTIISR 5027 





Parameters : 
B=5 

ctxfactor=6 . 00 
E=10 

Query 

Frame ' MatID Matrix name 
+0 0 BLOSUM62 

Q=9,R=2 

Query 

Frame MatID Length Eff. Length E S W T X E2 S2 

+0 0 334 334 10. 59 3 13 22 0.069 37 

33 0.063 42 



- ; As Used Computed 

Lambda K H Lambda K H 

0.321 0.136 0.423 same same same 

0.244 0.0300 0.180 n/a n/a n/a 



Statistics : 

Database : /usr/ local /db/e_f aecalis 
Title: /usr/local/db/e_f aecalis 
Release date: unknown 

Posted date: 12:53 PM EST Dec 11, 1998 
Format : BLAST 

# of letters in database: 3,209,119 

# of sequences in database: 293 

# of database sequences satisfying E: 2 
No. of states in DFA: 540 (57 KB) 
Total size of DFA: 97 KB (128 KB) 

Time to generate neighborhood: O.OOu 0.01s O.Olt Elapsed: 00:00:00 
No. of threads or processors used: 1 

Search cpu time: 2.07u 0.01s 2.08t Elapsed: 00:00:02 

Total cpu time: 2.08u 0.03s 2. lit Elapsed: 00:00:02 

Start: Wed Mar 17 09:11:29 1999 End: Wed Mar 17 09:11:31 1999 



The top-scoring match came from this contig (up to lOOObp on either side of 
the hit are shown) : 

>6277 (from 7224 to 9336) 

TTCAAACAACACATTAAGCGGCCACATAATCCCGAAATTTTGACAGGATTTAAAGATAAC 
CCTTGATCTTTAGCCATTTTGATTGAAACTGGCATAAAATCTCCTAGAAATGTTGAGCAA 
CATAGTTGTCTGCCACAAGGGCCAATGCCACCTAATATTTTCGCTTCATCTCGGACACCA 



Following those BLAST hits is the sequence of the contig containing the top 
hit. 



TBLASTN 2.0al9MP-WashU [ 14- Jul-1998 ] [Build linux-x86 18:51:45 30-Jul-1998] 
Reference: Gish, Warren (1994-1997) . unpublished. 

Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. 
Lipman (1990). Basic local alignment search tool. J. Mol . Biol. 215:403-10. 

Notice: statistical significance is estimated under the assumption that the 
equivalent of one complete reading frame of the database codes for protein and 
that significant alignments will involve only coding reading frames. 

Query= deltaprime . ecoli 
(334 letters) 



Database : /usr/local/db/e_f aecalis 

293 sequences; 3,209,119 total letters. 
Searching 10 20 30 40 50 60 70 . 



,80. 



.90. 



.100% done 



Sequences producing High-scoring Segment Pairs: 



Reading High 
Frame Score 



Smallest 
Sum 
Probability 
P(N) N 



6277 
6250 



-1 210 9.6e-16 1 
-2 162 2.9e-10 1 



>6277 

Length = 933 6 

Minus Strand HSPs : 

Score = 210 (73.9 bits), Expect = 9.6e-16, P = 9.6e-16 
Identities = 62/218 (28%), Positives = 105/218 (48%), Frame 



+++L S++ GR HA L + G G +++++ C + C 



HPD +AP+ G+ T+ V+ +RE+ + ++ KV + +A + + AAN+LLK 



LEEP + L T R+L T+ + SRC+ LH+ + + + + LLA 



Query: 


11 


Sbjct : 


8865 


Query : 


71 


Sbjct: 


8685 


Query : 


131 


Sbjct: 


8511 


Query : 


190 


Sbjct: 


8331 



A+ + Q D W +ARE + Q Y + S D 



>6250 

Length = 24,587 



Minus Strand HSPs : 



Score 


= 162 


(57 0 bits) Exnect = 2 9e-10 P = 2 9e-10 




Identities = 


= 41/134 (30%), Positives = 62/134 (46%), Frame = -2 




Query : 


25 


HALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKG 


84 






HAL GG+ ++ + C+Q+CCC + G D + + 




Sbjct : 


5419 


HAYLFTGPRGTGKTSAAKIFAKAINCKHSQDGEPCNVCETCVAITEGRLNDVI EI - -DAA 


5246 


Query : 


85 


KNTLGVDAVREWEKLNEHARLGGAKVVWVTDAALLTDAAANALLKTLEEPPAETW 


144 






N GV+ +R++ +K KV + + +L+ A NALLKTLEEPP F LA 




Sbjct : 


5245 


SNN-GVEEIRDIRDKAKYAPTQAEYKVYIIDEVHMLSTGAFNALLKTLEEPPQW 


5069 


Query: 


145 


TREPERLLATLRSR 158 








T EP ++ T+ SR 




Sbjct: 


5068 


TTEPHKIPLTIISR 5027 





Parameters : 
B=5 

ctxfactor=6 . 00 
E=10 



Query 
Frame 
+0 



MatID Matrix name 
0 BLOSUM62 
Q=9,R=2 



As used 

Lambda K 
0.321 0.136 
0.244 0.0300 



H 

0.423 
0.180 



Computed 

Lambda K H 
same same same 
n/a n/a n/a 



Query 

Frame MatID Length Eff. Length E S W 

+0 0 334 334 10. 59 3 



T X 
13 22 
33 



E2 
0.069 
0.063 



S2 
37 
42 



Statistics : 

Database : /usr/local/db/e_f aecalis 
Title: /usr/local/db/e_f aecalis 
Release date: unknown 

Posted date: 12:53 PM EST Dec 11, 1998 
Format: BLAST 

# of letters in database: 3,209,119 

# of sequences in database: 293 

# of database sequences satisfying E: 2 
No. of states in DFA: 540 (57 KB) 
Total size of DFA: 97 KB (128 KB) 

Time to generate neighborhood: O.OOu 0.00s O.OOt Elapsed: 00:00:00 
No. of threads or processors used: 1 

Search cpu time: -2.06u 0.02s 2.08t Elapsed: 00:00:02 

Total cpu time: 2.08u 0.03s 2 . lit Elapsed: 00:00:02 

Start: Wed Mar 17 10:15:00 1999 End: Wed Mar 17 10:15:02 1999 



The top-scoring match came from this contig (up to lOOObp on either side of 
the hit are shown) : 

>6277 (from 7224 to 9336) 

TTCAAACAACACATTAAGCGGCCACATAATCCCGAAATTTTGACAGGATTTAAAGATAAC 
CCTTGATCTTTAGCCATTTTGATTGAAACTGGCATAAAATCTCCTAGAAATGTTGAGCAA 
CATAGTTGTCTGCCACAAGGGCCAATGCCACCTAATATTTTCGCTTCATCTCGGACACCA 



ATTTGACGTAACTCAATTCGCGTCCGGAAAATAGCCGCTAAGTCTTTGACTAATTCACGA 
AAATCAATTCGCCCATCTGCCGTAAAGTAAAAAATCATTTTGCTACGATCGAAGGTATAT 
TCTACTCGCACTAATTTCATTTTTAAGTCATGAGCTCGAATTTTTTCATTGGCAATGCTT 
TTGGCAGCTTCTGCATCAGCCAAATTTTTTTGTTCTTTTTCTAAATCATTGGCTGTTGCT 
TTATTTAAAATGGGTTTTAGGTCCTCTGGTAAATCGTCTGAATCGACTGTTTTTTTAGGA 
ATAGCAACAGTAGCTAATTGTTTTGACTGTTGAGATTCAACGAGTACTTTCTCATTATAA 
ATATACTCAGATTTTCCAGGAGCAAAATAATAGATATGACCGGCTTCACGGAAGCGAACT 
CCTACTACTTCTACCATTTTATTCCTCCTAATCTAGTTCAAGTGAACGTCGCTTCAATCG 
GTTAAGGAACTTGCTGAAACAACTAAGTTCCTACTATATTATGAAACTGAATGCCACTTG 
GCACTTTTTTCCTTTATGATTTAGGGTGAATCATTTGGATAACTAATTGTTCACAAACAT 
TTTGCCAACTAACATTGGCAGTCCATTTTTGGCGTGCTTTCAAAATTAGCGCCAACCGTT 
CCGCCTGCTCTTCCGTTACTTTTTTGGCTTGCTGTGTCGCAACACTTTCTTCCAATAATT 
GACGGTAATAAACCATGAGCAAGTCAAAGCTAAGCGCTTGTTGTTCTTTTTCGTTAAATA 
CTTTGACCATTTTCTTCTGAACGTAGATAAATGCCTGTAAATCATTACTTTTTAGATAAT. 
TAAACCATTGCAAAATGATTTCCCTAGCTTCATTAAACCATTCATCTTGAGAGATTTCAA 
CTGCTTTCTCAAAACTATTTGTCAGTTCAGCTAAAAGGGTTGCAGTCTTTTCACCAATCC 
CCTGTTTGATTAAGCGATCAATTAATGTTTTTTTGACTAATGGTTGAAAATGTAAGGTTT 
GGCATCGTGATTGAATCGTTGGTAAAATTCGAGAAAGCGAAGTGGTTTCTAAAATAGCTA 
AAATTTGTCCTTCTGGTTCTTCTAAAAATTTTAAGAGACTATTAGCTGCGCCGGTACTCA 
TTTTATCTGCTTCTTGAATTAAGAAAACTTTTTTAGCAGTCTCGACCCCACTTTTAGAAA 
ACTCCGCTTTTAATTCACGGATTTGGTTCACTTTGATGGTTTGCCCATCTGGCGCAATTC 
TTAAAACATCTGGATGTTCATTTTCATTAATCCGCACACAATTATGGCATTCGTTACAAG 
GCTGTTGATTTACTAAATTCGTACAAAAGACATGTTTCGCCATCCATAAGCCAAATTCTT 
GTTTTCCAGTTCCTGTATCTCCTTCAAAAAGATAAGCATGGGCAAGACGACCATGCTCAA 
AACTTTTTTGGAGTTGCTTGTACAGCAAAGGTTGCATTTGCTGTAGCTGTTGTGCTTCAT 
TCATCTTAATATTGATGGAATCCTTCAACTGGTAAGACGAAGCAAGTAGCGCCGCCTACT 
TCAACTTCCACAGGATAAGGAATTTGGCCATCCATTGTGATATCTAAAGTCACAGGTGTT 
GAAACATATTGTTTTCTTGATTGACATGTTTCTTTAATTAAAGCTAATGTTTCGTCGACA 
CGTTCATCATCAATCCCAATAATAAATGTGCTGTTTCCCGCTTTTAAGAACCCACCTGTT 
GAGGATAATTTTGTAGCACGAATATTGGCATCAATAAATTCGTTGGCTAATCGGTTACTA 
TCTTTGTCTTGTACAATGGCTAAAATAATCTTCATGGTCTACACCTTCCTATAATTAAAA 
GTTTTCTGGATAACGTTCAATAATCGCCTGATACGTTGCTTCTACGACAAGTTCTAAACT 
CATCCGTGCATCA 



Following those BLAST hits is the sequence of the contig containing the top 
hit. 



TBLASTN 2.0al9MP-WashU [14-Jul-1998] [Build linux-x86 18:51:45 30-Jul-1998] 
Reference: Gish, Warren (1994-1997). unpublished. 

Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. 
Lipman (1990). Basic local alignment search tool. J. Mol . Biol. 215:403-10. 

Notice: statistical significance is estimated under the assumption that the 
equivalent of one complete reading frame of the database codes for protein and 
that significant alignments will involve only coding reading frames. 



Query= delta prime 

(334 letters) 



Database : /usr/ local /db/s pneumoniae 

270 sequences; 2,114,666 total letters. 
Searching. . . . 10 . . . . 20 . . . . 30 . . . . 40 . . . . 50 60 70 80. . . .90. . . .100% done 



Sequences producing 

sp_68 
■sp_3 6 



High-scoring Segment 



Reading 
Pairs : Frame 

-3 
+ 1 



Smallest 
Sum 

High Probabi 1 i ty 
Score P(N) N 

179 2.4e-12 1 
176 5.3e-12 1 



>sp_68 

Length = 21,744 



Minus Strand HSPs: 



Score = 179 (63.0 bits), Expect = 2.4e-12, P = 2.4e-12 
Identities = 66/236 (27%), Positives = 109/236 (46%), Frame = -3 



HA L G G ++ + + + C G + C +C CQ + G+ 



N GVD +RE+ +K L KV + + +L+ A NALLKTLEEP F LA 



T E ++ AT+ SR R + + + ++ L +E ++ +A+ + A R 



Query : 


25 


Sbjct: 


17440 


Query: 


85 


Sbjct: 


17266 


Query: 


145 


Sbjct: 


17089 


Query : 


199 


Sbjct: 


16909 


Query : 


253 


Sbjct: 


16738 



- ALALFQGDNWQARETLCQALAYSVPSGDWYSLLAALNHEQAPARLHWLATLL 252 
AL+L QG+ + + + + + + +AAL+ + P L L LL 



>sp_36 

Length = 



43,015 



Plus Strand HSPs : 

Score = 176 (62.0 bits), Expect = 5.3e-12, P = 5.3e-12 
Identities = 50/205 (24%), Positives = 89/205 (43%), Frame = +1 

Query: 6 WLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGC 65 

W F++ V + + +HA L + L++ L C G C CR C 

Sbjct: 23515 WQPAQFDRFWILEQDQLNHAYLFSGF—FESLEMAQFLAKSLFCTDKVGVLPCEKCRSC 23688 

Query: 66 QLMQAGTHPDYYTLAPEKGKNTLGVIDAVRWTEKLNEHARLGGAKVVWTDAALLTDA^ 125 

+L+ + G PD + P + + +RE+ + ++ +V + A + AA 

Sbjct: 23689 KLIEQGEFPDVTLIKPW--QVIKTERIRELVGQFSQAGIESQQQVFIIEQADKMHPNAA 23862 

Query: 126 NALLKTLEEPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWLSREVTMSQD 185 

N+LLK +EEP +E + F T + E++L T+RSR ++ + E+ + L + + + 

Sbjct: 23863 NSLLKVIEEPQSEWIFFLTSDEEKMLPTIRSRTQIFHFK-KQEEKLILLLEQMGLVKKK 24039 

Query: 186 ALLAALRLSAGS PGAALALFQGDNW 210 

ALA+S A Q W 

Sbjct: 24040 ATLLA- KFSQSRAEAEKLANQASFW 24111 



Parameters : 
B=5 

ctxfactor=6 . 00 
" E=10 



Query 








As 


Used 








Computed 




Frame 


MatID 


Matrix name 


Lambda 




K 




H 


Lambda 


K 


H 


+ 0 


0 


BLOSUM62 


0.321 


0 


.136 


0 


.423 


same 


same 


same 






Q=9,R=2 


0.244 


0 


.0300 


0 


.180 


n/a 


n/a 


n/a 


Query 






















Frame 


MatID 


Length Eff 


. Length 


E 


S 


w 


T X 


E2 


S2 




+ 0 


0 


334 


334 


10 


. 57 


3 


13 22 


0.069 


37 


















33 


0.063 


42 





Statistics : 



Database: /usr/ local /db/s_pneumoniae 
Title : /usr/ local /db/s_pneumoniae 
Release date: unknown 

Posted date: 12:57 PM EST Dec 11, 1998 
Format : BLAST 

# of letters in database: 2,114,666 

# of sequences in database: 270 

# of database sequences satisfying E: 2 
No. of states in DFA: 540 (57 KB) 
Total size of DFA: 97 KB (128 KB) 

Time to generate neighborhood: O.OOu 0.00s O.OOt Elapsed: 00:00:00 
No. of threads or processors used: 1 

Search cpu time: 1.44u 0.01s 1.45t Elapsed: 00:00:02 

Total cpu time: 1.45u 0.02s 1.47t Elapsed: 00:00:02 

Start: Wed Mar 17 09:13:52 1999 End: Wed Mar 17 09:13:54 1999 



articles 




The complete genome of the 
hyperthermophlllc bacterium 
Aquifex aeolicus 

Gerard DeckerTt, Patrick V. Warren*t, Terry Gaasterlandt, William G. Young*, Anna L. Lenox*, David E. Graham§, 
Ross Overbeekt, Marjory A. Snead*, Martin Keller*, Monette Aujay*, Robert Huberll, Robert A. Feldman*, 
Jay M. Short*, Gary J. Olsen§ & Ronald V. Swanson* 

* Diversa Corporation, 10665 Sorrento Valley Road, San Diego, California 92121, USA 
% Mathematics and Computer Science Division, Argonne National laboratory, Argonne, Illinois 60439, USA 
§ Department of Microbiology, University of Illinois, Urbana, Illinois 61801, USA 
II Lehrstuhlfur Mikrobiologie, Universitat Regensburg W-8400, Regensburg W-8400, Germany 



1 *'**\--.J>*. 



Aquifex aeolicus was one of the earliest diverging, and is one of the most thermophilic, bacteria known. It can grow on hydrogen, 
oxygen, carbon dioxide, and mineral salts. The complex metabolic machinery needed for A aeolicus to function as a 
chemolithoautotroph (an organism which uses an Inorganic carbon source for biosynthesis and an inorganic chemical energy 
source) is encoded within a genome that Is only one-third the size of the E . coll genome. Metabolic flexibility seems to be 
reduced as a result of the limited genome size. The use of oxygen (albeit at very low concentrations) as an electron acceptor is 
allowed by the presence of a complex respiratory apparatus. Although this organism grows at 95 °C, the extreme thermal limit of 
the Bacteria, only a few specific indications of thermophily are apparent from the genome. Here we describe the complete 
genome sequence of 1,551,335 base pairs of this evolutionary and physiologically interesting organism. 



Complete genome sequences have been determined for a number of 
organisms, including Archaea 1 , Bacteria 2 " 7 , and Eukarya 8 . Here we 
present and explore the genome sequence of Aquifex aeolicus. With 
growth-temperature maxima near 95 °C, Aquifex pyrophilus and 
A. aeolicus are the most thermophilic bacteria known. Although 
isolated and described only recendy 9 , these species are related to 
filamentous bacteria first observed at the turn of the century, 
growing at 89 °C in the outflow of hot springs in Yellowstone 
National Park 10 * 11 . The observation of these macroscopic assem- 
blages would later be instrumental in the drive to culture hyperther- 
mophilic organisms 12 . 

The Aquificaceae represent the most deeply branching family 
within the bacterial domain on the basis of phylogenetic analysis of 
16S ribosomal RNA sequences 13,14 , although analyses of individual 
protein sequences vary in their placement of Aquifex relative to 
other groups 15-1 *. The genera in this group, Aquifex and 
Hydrogenobacter, are thermophilic, hydrogen-oxidizing, microaer- 
ophilic, obligate chemolithoautotrophs 9 * 19-21 . A. aeolicus (isolated by 
R.H. and K. O. Stetter) was cultured at 85 °C under an H 2 /C0 2 /0 2 
(79.5:19.5:1.0) atmosphere in a medium containing only inorganic 
components. A. aeolicus does not grow on a number of organic 
substrates, including sugars, amino acids, yeast extract or meat 
extract Unlike its close relative A. pyrophilus, A. aeolicus has not 
been shown to grow anaerobically with nitrate as an electron 
Receptor in the laboratory. 

From study of the physiology of the organism, several predictions 
can be made. As an autotroph, A. aeolicus must have genes encoding 
proteins for one or more modes of carbon fixation and a complete 
■■^of Wosynthetic genes. As autotrophy is a feature that is dis- 
<^9mted throughout the Archaea and Bacteria, most of the asso- 
ciated genes are expected to be of ancient origin and clearly related 
f £?^k ose -characterized elsewhere. The obligate autotrophy suggests a 
ybiosynthetic rather than a degradative character. Oxygen respiration 

ter-'-i ■ ■ : 

^ft^f^f^ .Coder Bipinfbnnatks Service*, PO Box 90273. San Diego, California 92169. USA 
<*lifc Department of Bbinformatics; SmhhJOine Beecham Pharmaceuticals, CoUegeville. Phfladdphia 
K2^USA(P.V.W.) * 
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implies the presence of corresponding utilization and tolerance 
genes. The early divergence of the Aquificaceae inferred from 
ribosomal RNA sequences leads to several questions. Are the 
machineries for oxygen usage and tolerance homologous to those 
found in mitochondria . and well studied organisms such as 
Escherichia coli, or were they invented separately? If there was far 
less oxygen when the lineage originated, is there evidence for use of 
alternative oxidants? 

Genome 

General features of the A. aeolicus genome are listed in Box 1. We 
classified 1,512 open-reading frames (ORFs) into one of three 
categories, namely, identified (Table 1), hypothetical, or unknown. 
Identified ORFs were further classified into one of 57 cellular role 
categories adapted from Riley 22 (Table 1). The relatively high G + C 
content of the two 16S-23S-5S rRNA operons (65%) is character- 
istic of thermophilic bacterial rRNAs 23 . The genome is densely 
packed: most genes are apparently expressed in polycistronic 
operons and many convergently transcribed genes overlap slightly. 
Nonetheless, many genes that are functionally grouped within 
operons in other organisms, such as the tryptophan or histidine 
biosynthesis pathways, are found dispersed throughout the A. 
aeolicus genome or appear in novel operons. Even when they 
encode subunits of the same enzyme, the genes are often separated 
on the chromosome (for example, gltB and g/fD, the genes encoding 
the large and small subunits of glutamate synthase). Operon 
organization of genes for the biosynthesis of amino acids is found 
in both Archaea and Bacteria but it is not universal in either group. 
A. aeolicus is extreme in that no two amino acid biosynthetic genes 
are found in the same operon. In contrast, genes required for 
electron transport, hydrogenase subunits, transport systems, ribo- 
somal subunits, and flagella are often in functionally related 
operons in A. aeolicus (Fig. 1). No introns or inteins (protein 
splicing elements) were detected in the genome. 

A single extrachromosomal element (ECE) was identified during 
sequencing. Sequence redundancy for the total project was calcu- 
lated to be 4.83. The ECE, however, is significantly over-represented 
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relative to the chromosome; when calculated independently for the 
final assemblies, redundancies are 4.73 and 8.76 for the chromo- 
some and for the ECE, respectively. The ECE therefore appears to be 
present at roughly twice the copy number of the chromosome. 
Although no ORFs on the ECE can be assigned a function with 
confidence, except for a transposase, two of the predicted proteins 
show similarity to hypothetical proteins in the Methanococcus 
jannaschii genome 1 . One ORF on the ECE is also present in two 
identical copies on the A. aeolicus chromosome, providing evidence 
of genetic exchange between the chromosome and the ECE. 

Reductive tricarboxylic acid cycle 

As an autotroph, A. aeolicus obtains all necessary carbon by fixing 
C0 2 from the environment. An assay for activity of the reductive 
tricarboxylic acid (TCA) cycle in A. pyrophilus cell extracts showed 
in vitro activities for each proposed reaction 24 . The reductive 
(reverse) TCA cycle fixes two molecules of C0 2 to form acetyl- 
coenzyme A (acetyl-CoA) and other biosynthetic intermediates 25 . 
The A. aeolicus genome contains genes encoding malate dehydro- 
genase, fumarate hydratase, fumarate reductase, succinate-CoA 
Iigase, ferredoxin oxidoreductase, isocitrate dehydrogenase, aconi- 
tase and citrate synthase, which together could constitute the TCA 
pathway. There is no biochemical evidence for alternative carbon- 
fixation pathways in A. pyrophilus 2425 nor is there sequence evidence 
for such pathways in A. aeolicus. 

The TCA cycle is vital as it provides the substrates of many 
biosynthetic pathways. (It is beyond the scope of this report to detail 
these biosynthetic pathways, but they seem to be typically bacterial, 
and candidate genes for all or most of the enzymes have been 
identified in A. aeolicus.) The central role of the TCA cycle is 
emphasized by duplication of many of its constituent genes in 
A. aeolicus. Two genes encode proteins that are similar to malate 
dehydrogenase (in addition to a lactate dehydrogenase). The fuma- 
rate hydratase is split into amino- and carboxy-terminal subunits, as 
is the case in M jannaschii 1 . Unlinked genes encoding two iron- 
sulphur proteins of fumarate reductase (alternatively succinate 
dehydrogenase) accompany a single flavoprotein subunit. Two 
sets of genes resembling succinate-CoA ligase (both the a- and (3- 
subunits) are present. A. aeolicus has two putative operons encoding 
four-subunit (ot, p, 7, 8) 2-acid ferredoxin oxidoreductases; mem- 
bers of this family catalyze reversible carboxylation/decarboxylation 
of pyruvate, 2-isoketovalerate, or 2-oxoglutarate with varying 
specificity 26 . These duplicated genes may encode paralogous pro- 
teins with unique substrate specificity, as opposed to redundant 
functions. For example, a paralogue of succinate-CoA ligase may 
activate citrate with coenzyme A to formxirtylx^i&A* which citrate 
synthase can cleave to produce oxaloacetate and acetyl-CoA. 

Gluconeogenesis through the Embden-Meyerhof-Parnas 
pathway 

Growing autotrophically, A. aeolicus must synthesize pentose and 
hexose monosaccharides from products of the reductive TCA cycle. 
Pyruvate produced by pyruvate ferredoxin oxidoreductase or by 
pyruvate carboxylase (oxaloacetate decarboxylase) 24 may enter the 
Embden-Meyerhof-Parnas pathway of glycolysis and gluconeo- 
genesis. Genes encoding fructose- 1,6-bisphosphatase, an essential 
gluconeogenic enzyme in E. col'u have not been identified in the 
genomes of the autotrophs A. aeolicus or M. jannaschii 1 , suggesting 
that an unidentified pathway may exist. The A. aeolicus genome also 
encodes enzymes of the pentose-phosphate pathway and enzymes 
for glycogen synthesis and catabolism. We found neither (phospho) 
gluconate dehydrase nor 2-keto-3-deoxy-(6-phospho)gluconate 
aldolase of the Entner-Doudoroff pathway. 

Respiration 

Aquifex species are able to grow by using oxygen concentrations as 
low as 7.5 p.p.m. (R.H. and K. O. Stetter, unpublished observations). 
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The enzymes for oxygen, respiration are similar to" 'Ifto^Lup 
bacteria: ubiquinol cytochrome c oxidoreducta^ ;(^^o™ 
cytochrome c (three different genes) and cytochrom^S- 
(with two different subunit I genes and two d^efeiitYsifli^y,^ 
genes). The alternative system, with cytochrome W ufl™^ 
oxidase, is also present. Clearly, the Aquifex lineage did^So^ 
pendently invent oxygen respiration. This leaves at '-'la^L^ 
possibilities: consistent with the ability of Aquifex to us£ ver^fer 
levels of oxygen, the oxygen-respiration system was highly^^' 
oped when oxygen had only a small fraction of its preseht;^nc«4-- 
x tration before .the advent of oxygenic photosynthesis; cohtw^^ 
what is implied by the 16S phylogeny the lineage including A^40 
originated after the rise in atmospheric oxygen; or oxygen resplra^f 
tion developed once, and was then laterally transferred 'among! 
bacterial lineages and acquired by Aquifex. \f?:;$F 
Many other oxidoreductases are present in addition to tholll 
obviously involved in oxygen respiration. The physiological role'Sf J 
most of these oxidoreductases is unknown or ambiguous, but two^ 
deserve comment. There is a putative nitrate reductase in v th|| 
genome, although A. aeolicus has not been observed to perforay 
NO; respiration, unlike the closely related A. pyrophilus. The nitxatp 
reductase gene is adjacent to a nitrate transporter, and may'lbe^ 
involved in nitrogen assimilation rather than respiration. It is als$| 
possible that A. aeolicus has a latent ability to respire with nitrate bu£ 
that the conditions required have not been found. Two gene' 
sequences show strong similarities to Rieske proteins, even/j 
though the rest of the ubiquinol cytochrome c oxidoreductase^ 
subunits appear only once in the genome. One of these Riesker 
protein genes is adjacent to a sulphide dehydrogenase subunit^ 
suggesting a role in sulphur respiration. . j 

Oxidative stress ^ 

A. aeolicus grows optimally under microaerophilic conditions an<3U 
consequently possesses various protective enzymes to countenl 
reactive oxygen species, particularly superoxide and peroxide. Thef 
genome contains three genes encoding superoxide dismutases, tWof 
of the copper/zinc family and one of the iron/manganese family. 1 ? 
The latter has also been noted in A. pyrophilus 17 . One of the copper/ - 
zinc superoxide dismutase genes is located in a large gene cluster 
encoding formate dehydrogenase. \ 

No catalase genes were identified. There are several genes in the 
genome that might encode proteins that catalyze the detoxification 
of H 2 0 2 > including cytochrome c peroxidase, thiol peroxidase, and ■ 
two alkyl hydroperoxide reductase genes. All of these enzymes ' 
require an exogenous reductant and therefore do not evolve 0 2 . 
However, treatment of A. pyrophilus 9 or A. aeolicus biomass with : 
H 2 0 2 results in the rapid evolution of gas bubbles. This catalase 
activity may result from a novel enzyme that cannot yet be identified 
by sequence similarity. 



Motility 

Like A. pyrophilus 9 , A. aeolicus is motile and possesses monopolar 
polytrichous flagella. More than 25 genes encoding proteins 
involved in flagellar structure and biosynthesis have been identified 
in A. aeolicus (Box 1). However, no homologues of the bacterial/; 
chemotaxis system were identified. In enteric bacteria, membrane^ 
bound receptors bind chemoattractants and repellents and moi^ 

Figure 1 Linear map of the A aeolicus circular chromosome. Genes are shown 
as arrows which denote the direction of transcription and are coloured to denote 
functional categorization according to the key below the figure. The sequences of 
the two rRNA gene clusters are identical. Here, the first base of the coding 
sequence of fusA was arbitrarily assigned as base number t as no origin of 
replication has been identified. ORF numbers are discontinuous because some 
ORFs representing 100 amino acids or more are not predicted to Se coding and 
are not shown. 
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rhdA 

sor 

soxfi 



Polvamines 
Aq728 
Aq062 

Sulfur 
Aql 08! 
Aql076 
Aql 799 
Aq453 
Aq!803 

Cofactor Biosynthesis 
Lipoic acid biosynthesis 
Aqt355 lipA 

Biotin 
Aql70 
Aq975 
Aq557 
Aq626 
Aql639 



Aq566 

Folic acid 
Aq2045 
Aq!898 
Aq239 
Aql 62 

AqH68 
Aql 144 
Aql 606 

Heme 
Aq207 
Aql 237 
Aq334 
Aq8l6 
Aql 279 

Aq2t09 
Aq263 
Aql 424 



bioA 
bioB 
bioD 
bioF 
bioW 

birA 

foIC 
folD 
folE 
folK 

folP 
pabB 
pjbC 

cobA 
cysC 
dcuP 

hemA 

hemB 
hemC , 
hemF 



flagellar hook associated protein FliD 
Flagellar M-ring protein 
flagellar switch protein FliG 
flagellar export protein 
flagellar biosynthesis FliL 
flagellar switch protein FliN 
flagellar biosynthetic protein FliP 
flagellar biosynthesis protein FliQ 
flagellar biosynthetic protein FUR 
flagellar protein FliS 
flagellar motor protein MotA 
flagellar motor protein MotB 
flagellar motor protein MotB-like 

signal recognition particle receptor protein 
general secretion pathway protein D 
general secretion pathway protein E 
general secretion pathway protein G 
type- 1 signal peptidase 
lipoprotein signal peptidase 
processing protease 
fimbria! assembly protein PilC 
fimbria! assembly protein PilC 
type 4 prepilin peptidase 
twitching motility protein PilT 
twitching mobility protein 
preprotein translocase SecA stibium 
protein export membrane protein SecD 
protein -export membrane protein 
preprotein translocase SecY 
proteinase IV 

type IV pilus assembly protein TapB 
trigger factor 



5, 1 0-methyfenetetrahydrofolate reductase 
S-adenosyimethionine synthetase 
S-adenosyihomocysteine hydrolase 

cellulose synthase catalytic subunit 
endoglucanase fragment 
glycogen synthase 
1,4-alpha-glucan branching enzyme 
glycogen phosphorylase 
4-alpha-glucanotransferase (amylomaltase) 



Aq20IS 
Aq948 
AqUv9 
Aq2l24 

Molybdopterin 
AqllHJ moaA, 



hemC 
hemH 
hemK 
hemN 



aconttase 

ferredoxin oxi do reductase alpha subunii 
ferredoxin oxidoreductase alpha subunit 
ferredoxin oxido reductase beta subunit 
ferredoxin oxidoreductase beu subunit 
ferredoxin oxidoreductase gamma subunit 
ferredoxin oxidoreductase gamma subuntt 
fumarate reductase flavoprotein subunit 
reductase iron-sulfur subunit 
fumarate reductase iron-sulfur subunit 
fumarate hydratase (fumarase) 
C-terminal fumarate hydratase. class I 
citrate synthase 
isocitrate dehydrogenase 
nutate dehydrogenase 
malate dehydrogenase 
oxaloacetate decarboxylase alpha chain 
succinyl-CoA ligase beu subunit 
succinyl-CoA ligase beta subunit 
succinyl-CoA ligase alpha subunit 
succinyl-CoA ligase alpha subunit 

phosphate starvation-inducible protein 
inorganic pyrophosphatase 
exopolyphosphatase 

ornithine decarboxylase 
spermidine synthase 

sulfate adenytyUransferase 

thiosutfate sulfurlransferase 

thiosuifate sulfurlransferase 

sulfur oxygenase reductase . . _ 

sulfur oxidation protein SoxB ' ^ K 



Lipoic acid synthetase 

DAPA aminotransferase 

biotin synthetase 

dethiobiotin synthetase 

8-amino-7-oxononanoate synthase 

*-carboxyhexanoate-CoA ligase 

(pimeloyi CoA synthase) 

biotin [acetyl -CoA-carboxylasel ligase 

folylpolyglutamate synthetase 
methylenetetrahydrofolate dehydrogenase 
OTP cyclohydrolase I 
folate biosynthesis 7,8-dihydro-6- 
hydroxymethvipterin-pyTophosphofctnase 
dihydropteroate synthase 
p-aminoberuoate syntheUse 
aminodeoxychorismate lyase 

uroporphyrin-! 1 1 c-methyttransferase 
stroheme synthase 
uroporphyrinogen decarboxylase 
glutanute-l-semialdehyde aminotransferase 
glutamyl tRNA reductase 
(delta-aminolevulinate synthase) 
porphobilinogen synthase 
porphobilinogen deaminase 
oxygen-independent coproporphyrinogen III 
oxidase 

protoporphyrinogen oxidase 
fernKheUtase 

protoporphyrinogen oxidase 
oxygen-independent coproporphyrinogen It 

molvbdenum cofactor biosynthesis protein \ 



24.3% 

32.0%. 

35.9% 

44.6% ., 

30.6% ., 

42.9% ., 

47.7%., 

45.5%.. 

29.7%., 

30.8%.. 

35.0% 

36.8% .. 

27.5% 

49.1%., 

27.5% 

48.8% 

50.7% .. 

33.9% .. 

37.4%.. 

28.7% .. 

37.4% .. 

28.9% 

34.8% 

51.4%.. 

41.6%.. 

44.9% .. 

36.0% .. 

41.4%.. 

44.2%.. 

43.4%.. 

42.2% .. 

27.4%.. 



43.3% . 
49.2% . 
60.9%. 

39.5%. 
33.0%. 
3S.1%. 
56.5%. 
37.CS. 
43.4% ., 

36.1%.. 
31.5%., 
32.3%., 
29.6%., 
31.3%., 
34.5%., 
34.5% .. 
31.4%., 
35.2% ., 
35.1%., 
46.4% .. 
40.4% .. 
33.0% ., 
46.0% .. 
49.3% ., 
46.9%.. 
50.1% .. 
35.1%., 
52.9% .. 
41.7% .. 
65.7% .. 

47.1%.. 
5o.5% .. 
33.6% .. 

30.9% .. 
48.4% ., 



31.7%. 
36.7% . 
41.3% . 



51.7% ., 
42.0% . 
41.5%., 
45.1%., 

47.3% ., 
37.5% ., 

3!. 8%.. 
53.:%.. 
57.1%,. 

43.7% .. 
45.8% 
41.3%.. 
29.0% .. 

52.1%.. 
36.9%.. 
4i.4% ., 
56.5'% .. 

3$.7% .. 
64.5% .. 
53.1% 

33.1%.. 
30.3% 
46.4% .. 
32.2% ., 
50.2% 







Aq2i81 


moaE 


Aql 326 


mobB 


Aq030 


moeAl 


Aql329 


moeB 


Aq061 


mog 


Aq049 


phhB 


Panthenate 




Aq815 


dfp 


Aql973 


panB 


Aq2132 


panC 


Aq476 


panD 


Pyridine nucleotides 


Aql 889 


nadA 


Aq777 


rudB 


Aq869 


nadC 


Aq959 ■ 


rudE- 


Pyridoxal phosphate 


Aq852 


pdxA 


Aql 423 


pdxj 


Quinones 




Aq895 


ispB 


Aq052 


ubiA 


Riboflavin 




Aq350 


ribA 


Aql707 


ribC 


Aql38 


ribDl 


Aq436 


ribD2 


AqU9 


ribF 


Aql32 


ribH 


Thiamine 




Aql204 


thiC 


Aql960 


thiD 


Aql366 


thiEl 


Aq558 


thiE2 


Aq2178 


thtG 


Aq2ll9 


thiL 



Thio- and gluiaredoxin 
Aq443 gua 
Aql9l6 trxAl 
AqlSII trxA2 
Aq500 trxB 

Energy Metabolism 
Aql 342 gph 

ATP- Proton Motive Force 



Aq679 
Aql79 
Aq673 
Aq2038 
Aql 77 
Aql586 
Aql 587 
Aq2041 
AqI58S 

Dehydrogenases 
Aql362 
Aql 240 
Aql86 
Aq227 
Aql 145 
Aq232 
Aql 769 
Aql 234 
Aql 232 
Aql 231 
Aql051 
Aql039 
Aql046 
Aql 049 
Aql9f)3 
Aql 109 
Aql 639 
Aq395 
Aq400 
Aq398 
Aq961 
Aq038 
Aq727 
Aq736 
Aq2l7 
Aq206 
Aq833 
Aq024 
Aql 35 
AqlOlO 

Electron 
Aq2l9l 
Aq2192 
Aq2l90 
.■\q2l88 
Aql53 
Aq042 
Aq792 
Aql 550 
Aql357 
Aql 358 
Aq067 
Aq235 
Aq9l9a 
.Aqll7la 
Aql 192a 
AqlOSa 
Aq2ll 
Aq2096 
Aq045 
Aq044 
Aq234 
Aq2186 

GIwroK-su 
Aq484' 
Aql 390 
Aql0*5 
Aq434 
Aql 744 
Aql 



atpA 
atpB 
arpC 
atpD 
atpE 
atpFl 
arpF2 
atpG 
atpH 



adhl 
idh2 
aldHl 
a!dH2 
dhaT 
dhst* 
did I 
dmsA 
dmsB 
dmsC 
fdhE 
fd.iG 
fdoH 
tdol 
gcsPl 
<csP2 

&ipc 

hdrA 
hdrB 
hdrC 
hdrD 
hibD 
WhA 
lpd\ 
narB 
nirB 
nux 
nsd 
nueM 
udh 

transport 

coxA 1 
coxA2 
coxB 
coxC 
c:aA 
ox 
ocBl 
ovB2 
cvdA 
cv^B 
dmsB 
t'ccB" 
Idxl 
fdx2 
fdx3 
fdx4 
flip 
floX 
petA 
petB 
soxF 
*qr 

and gluconeogeneNii 
eno 
tba 
fjp 

lE-mA 
i-pA 



molybdenum cofactor biosynthesis moaC 45.0% . 

molybdopterin converting factor subunit 2 39 J% . 
molybdopterin-guainine dinucleotide 

biosynthesis protein B 44.4% . 

molybdenum cofactor biosynthesis protein A 36.8% . 

molybdopterin biosynthesis protein MoeB 54. 1 % . , 

molybdenum cofactor biosynthesis MOG 55 j% ., 

pterm-4a-caihinoUmine dehydratase 37.9% ., 

pantothenate metabolism flavoprotein 41.2% 

3- methyt-2 -oxobutanoate 

hyilroxymcthyltransferase 45.5% 

pantothenate synthetase 47.4% ., 

aspartate 1 -decarboxylase 46.0% ., 

quinoltnate synthetase A 44J% 

L-aspartate oxidase 36.7% 

quinolinate phosphoribosyl transferase 47.0% .. 

NHI3)*depcndent NAD+ synthetase 39.6% .. 

py ridoxal phosphate biosynthetic protein PdxA 36.8% . . 

pyridoxal phosphate synthetase 88.2% 

octoprenyl-diphosphate synthase 35.7% .. 

4- hydroxybenzoate octaprenyuransferase 4 1 .4% 

GTP cyclohydrolase II 61.7% .. 

riboflavin synthase alpha chain 45 J% .. 

riboflavin specific deaminase 46.0% .. 

riboflavin specific deaminase 42.9% 

riboflavin kinase 38.4% .. 

riboflavin synthase beu subuntt . 51.0%.. 

thiamine biosynthesis protein 67.]% 

HMP-P kinase 40.5%.. 

thiamine phosphate synthase 36.3% . . 

thiamine phosphate synthase 39.5% .. 

thiamine biosynthesis, thiazole moiety 52.5% .. 

thiamine monophosphate kinase 34.5% .. 

gluiaredoxin -like protein 33.8%.. 

thioredoxin 58.9% .. 

thioredoxin 32.2% .. 

thioredoxin reductase 39.8% .. 

phosphoglycolate phosphatase 33.9*% .. 

ATP synthase Fl alpha subunit 64.3% .. 

ATP synthase F0 subunit a 36.4% .. 

ATP synthase Fl epsilon subunit 37.4% .. 

ATP synthase Fl beu subunit 67.4% ... 

ATP synthase F0 subunit c 53.8% .., 

ATP synthase FO subunit b 26.3% .. 

ATP synthase F0 subunit b 25 J% , 

ATP synthase Fl gamma subunit 39.9% ... 

ATP synthase F 1 delta chain 28. 1 % . . , 

alcohol dehydrogenase 35.4% 

alcohol dehydrogenase 28.8% .., 

aldehyde dehydrogenase 41.9% ... 

aldehyde dehydrogenase 28.0% ... 

1.3 propanediol dehydrogenase 36.6% ... 

flivocytochrorne C sulfide dehydrogenase 33.6% ... 

D-lacute dehydrogenase 4 3.3% . . . 

DMSO reductase chain A 25.0% ... 

DMSO reductase chain B 38.4% ... 

DMSO reductase chain C 29.5% . 

formate dehydrogenase formation protein FdhE 25.9% ... 

formate deh>*drogenase alpha subunit 50.0% ... 

formate dehydrogenase beta subunit 45.7% ... 

formate dehydrogenase gamma subunit 38.4% ... 

glycine dehydrogenase (decarboxylating) 49.6% 

glycine dehydrogenase fdecarboxylating) 46.8% ... 

oxido/reductase iron sulfur protein 27.1% ... 

heterodisulfide reductase subunit A 39.7% ... 

hcterudisutfide reductase subunit B 32.5% ... 

heterodisulfide reductase subuntt C 35.7% . 

heterodisulfide reductase 29.5% ... 

3-hydroxyisobutyrate dehydrogenase 34.6% ... 

D-lactate dehydrogenase 33.5% ... 

dihydrolipoamide dehydrogenase 37.0% ... 

nitrate reductase narB 39.1% ... 

nitrite reductase (NAD(P)H) large subunit 35.3% ... 

NADH oxidase 33.1%... 

nucleotide sugar dehydrogenase 47.0% ... 

NADH dehydrogenase (ubiquinone) 28.2% ... 

dehydrogenase 29.7% ... 

cytochrome c oxidase subuntt I 42.4% ... 

cytochrome c oxidase subunit I . 38.1% ... 

cytochrome c oxidase subunit II 27.4% 

cytochrome c oxidase subunit III 28.6% 

heme O oxygenase 28.1% ... 

cytochrome c 25.8%... 

cytochrome c552 29.9%.. 

cytochrome C552 38.7% ... 

cytochrome oxidase d subunit I 38.8% 

cytochrome oxidase d subunit 11 31.2% .... 

dimethylsulfbxide reductase chain B 40.2% .... 

sulfide dehydrogenase, fijvoprotein subunit 38.0% ... 

ferredoxin ' 37.1%... 

ferredoxin 43.9% .. 

ferredoxin 35.0% ... 

ferredoxin 56.6% ... 

tbvohemoprotein 43.4% .... 

flavodoxin 32.5% .... 

Rieske-1 iron sulfur protein 34 J% .... 

cytochrome b 38.3% ... 

Rieske-I iron sulfur protein 29.0% .... 

sulride-quinone reductase 41.0% .... 

* noIj * f 65.0% .... 

rructow-l.6-bisphoiphate aldolase class II 39.9% .... 

glyceraldehj-de-3-phosphate dehydrogenase 59.5% .... 

glycerol kinase ' 51.0%.... 

phosphuglycerate mutase 27.9% .... 

Jvkcrol-3-phniphate dehydrogenase (S*ADt( 40.5% 



Aql 708 


pfkA 


A<]750 


pgi 


AqllS 


P& 


Aql990 


pgmA 


Aq50l 


pmu 


Aq2142 


ppsA 




pycA 


Aql5l7 


pycB 


Aq360 


limA 


Hydrogenase 




Aq665 


hoxZ 


Aq667 


hupD 


Aq666 


hupE 


Aql021 


hypA 


Aq671 


hypB 


Aqtl57 


hypD 


Aq662 


mbhLl 


Aq960 


mbhU 


Aq804 


mbhU 


Aq660 


mbhSl 






AqS02 


mbhS3 


Aql591 


shyS 


Sugar metabolism 


Aq%8 


cbbE2 


Aql65S 


fucAl 


Aql979 


fucA2 


Aq498 


gnd 


Aq497 


gsdA 


Aql 138 


rpiB 


Aql 19 


lalC 


Aql 765 


iktA 


NADH dehydrogenase 


Aql 385 


nuoAl 


Aql310 


nuuA2 


Aql312 


nunB 


Aq551 


nuoDl 


Aql314 . 


nuoD2 


Aq574 


nuoE 


Aq573 


nuoF 


Aq437 


nuoG 


Aql315 


nuoHl 


Aql 373 


nuoH2 


Aql 374 


nuoH3 


Aql317 


nuoll 


Aq1375 


nuol2 


Aql 318 


nuoll 


Aql377 


nuoJ2 


Aql319 


nuoKl 


Aql 378 


nuoK2 


Aql 320 


nuoLl 


Aq866 


nuoL2 


Aql 379 


nuoL3 


Aql321 


nuoM 1 


Aql 382 


nuoM2 


Aql 322 


nuoN 1 


Aql 383 


nuoN2 


lipid metabolism 


Aq2058 


aas 


Aql 206 


accA 


Aql 363 


accB 


Aql 664 


accCl 


Aql470 


accC2 


Aq445 


accD 


Aql 7 17a 


acpP 


AqS13 


acpS 


Aq2104 


acs 


Aq2103 


acs* 


Aql 249 


cds 


Aql 737 


cfa 


Aq892 


fabD 


Aql7l7 


fabF 


Aql716 


fabG 


Aql099 


fabH 


Aql 552 


fabl 


Aq056 


fabZ 


Aq999 


fadD 


Aql 638 


IplA 


Aq958 


pgsA 


Aq2l54 


pgsA 


AqllOl 


plsX 


Purines, Pyrimi dines, N 


Aqtf94 


nrdA 


Aql 505 


nrdF 


Purines 




Aq568 


deoD 


Aq236 


guaA 


Aq2023 


guaB 


Aq544 


hpt 


Aq078 


kad 


Aql 590 


ndk 


Aql 636 


prs 


Aql 290 


purA 


Aq597 


purB 


Aq2117 


purC 


Aq742 


purD 


Aql 178 


purE 


Aql 175 


purF 


Aql 963 


purH 


Aq245 


purK 


Aql 836 


purL 


Aq769 


pur.M 


Aq857 


purN 


Aql 105 


purQ 


Aql818 


purl/ 


Pyrimidines 




Aq410 


carA 


Aql 172 


carB 


Aq2101 


carB 


Aq2153 


cmk 


Aql 607 


ded 


Aq220 


dul 


Aq409 


pyrB 


Aq806 


pyrC 



phosphofructokina.se *"* 49.4%... 

glucose-o^phosphate isomerase 37.8%... 

phosphogjycerate kinase 34.3% . . . 

phmphoglycrrate mutate 33.2%... 

phosphctgWomutase/phosphomannomutase 37 .8% ... 

phosphocnolpyruvate synthase 56.3% ... 

pyruvate carboxylase c-lerminal domain 46.6% ... 

pyruvate carboxylase n-terminal domain 57.1% ... 

triose phophate isomer ase 52.2% ... 

Ni/Fe hydrogenase B-type cytochrome subunii 40.4% ... 

HupD hydrogenase related function 40.9% ... 

HupE hydrogenase related function 38.3% ... 

hydrogenase accessory protein HypA 39.8% ... 

hydrogenase expression/formation protein B 50.6% ... 

hydrogenase expression/formation protein HypD 56.1% ... 

hydrogenase large subunit 50.6% ... 

hydrogenase large subunit 44.3% ... 

hydrogenase large subunit 27.9% ... 

hydrogenase small subunit 66.6% ... 

hydrogenase small subunit 51.3%... 

hydrogenase small subunit 36.7% ... 

soluble hydrogenase small subunit 41.6% ... 

ribulose-5-phosphate 3-epimerase 47.2% ... 

fuculose-1 -phosphate aldolase 31.8% ... 

fuculose-1 -phosphate aldolase 29.7% ... 

6-phosphogluconate dehydrogenase 45.2% ... 

glucose-6-phosphate 1 -dehydrogenase 32.3% ... 

ribose 5-phosphale isomerase B 34.5% ... 

transaldobse 71.1% ... 

transketolase 52.4% ... 

NADH dehydrogenase I chain A 42.0"« ... 

NADH dehydrogenase I chain A 44.9% ... 

NADH dehydrogenase I chain B bO.1% ... 

NADH dehydrogenase 1 chain D 37.7% ... 

NADH dehydrogenase I chain D 42.2% ... 

NADH dehydrogenase I chain E 36.8% ... 

NADH dehydrogenase I chain F 20.5% .. 

NADH dehydrogenase I chain G 35.4% .. 

NADH dehydrogenase 1 chain H 41.0% ... 

NADH dehydrogenase I chain H 42. 1% ... 

NADH dehydrogenase 1 chain H 38.9% .., 

NADH dehydrogenase I chain 1 30.5% ... 

NADH dehydrogenase I chain I 29.2% ... 

NADH dehydrogenase I chain I 35.4% ... 

NADH dehydrogenase 1 chain I 30.6% ... 

NADH dehydrogenase I chain K 51.1% ... 

NADH dehydrogenase I chain K 48-4% ••• 

NADH dehydrogenase 1 chain L 39.0% ... 

NADH dehydrogenase I chain L 30.2% ... 

NADH dehydrogenase I chain L 43.1% ... 

NADH dehydrogenase 1 chain M 43.6% ... 

NADH dehydrogenase I chain M 36.9% 

NADH dehydrogenase I chain N 34.1% ... 

NADH dehydrogenase I chain N 318% ... 

2 - acyl gl vceropho sp hoethanolami ne 

acyltransferase 37.1% ... 

acetyl-CoA carboxylase alpha subunit 57.1% ... 

bioim carboxyt carrier protein 44.6% .. 

biotin carboxylase 54.4% .. 

biotin carboxylase 56.5% .. 

acetyl-CoA carboxyttransferase beta subunit 56.9% .. 

acyl carrier protein 71.2%.. 

holo-(acyt-carrier protein] synthase 30.8% .. 

acetyl-coenzyme A synthetase 54.0% .. 
acetyl-coenzyme A synthetase 

c-terminal fragment 61.2% .. 

phosphatidatecytidyrjItransfeTase 29.2%.. 

cyclopropane-fatty-acyl-phospholipid synthase 37.3% .. 

malonyl-CoA^cyl carrier protein t ransacylase 42. 1 % . . 

3-oxoacyHacyl-carrier-protein] synthase II 58.4% 

3 -oxoacyl- [acyl -carrier-protein] reductase 52.9% .. 

3-oxoacyl-tacyl-carrier- protein] synthase 111 47.0% .. 

enoyl-[acyl-carrier- protein] reductase (NADH) 49.6% .. 
(3R)-hydroxymyristoyl-(acyl carrier protein) 

dehydratase 58.7% .. 

long-chain-fatty-add CoA ligase 30.0% .. 

lipoate-protein ligase A 28.1% .. 

phosphotidylglycerophosphate synthase 37.3% .. 

phosphotidvlglycerophosphate synthase 38.9% ,. 

PlsX protein 43.7% .. 

ides and Nucleosides 

ribonucleotide reductase alpha chain 35.0% .. 

ribonucleotide reductase beta chain 36.2% . 

purine nucleoside phosphorylase 33.1% .. 

GMP synthase 58.4% .. 

inosine monophosphate dehydrogenase 65.4% .. 

hypoxanthine-guanine phosphoribosyltransferase 48.2% .. 

adenylate kinase 50.0% . . 

nucleoside diphosphate kinase 48.2% . . 

phosphoribosylpyrophosphate synthetase 55.2% .. 

adenylosuccinate synthetase 49.2%.. 

adenylosuccinate lyase 52.4% . . 
phosphoribosylaminoimidazole- 

succinocarboxamide synthase 52.5% .. 

phosphoribosylamine-glycine ligase 54.2% 

phosphoribosylaminoimidazole carboxylase 64.6% 

amidophosphoribosyltransferase 42.7% 
phosphoribosvlaminoimidazolecarboxamide 

ibrmyltransferase 48,2% .. 

phusphoribosyl aminoimidazole carboxylase 35.6% 

phosphoribosylformylglycinamidine synthase II 49.3% .. 

phosphoribosylformylglycinamiditie cyclo-b'gase 50.0% ., 

phosphoribosyiglycinamide formyltransfcrase 48.3% .. 

phosphoribosyl formylglycinamidine synthase 1 51.1%.. 

tormyllctrahydrofolate deformylase 56.3% .. 

carbamoyl phosphate synthetase small subunii 52.2% . , 

carbamoyl-phosphate synthase large subunit 60.7% ., 

carbamoyl-phosphate synthase, Urge subunii 63.1% . 

cytidybte kinase 38.5%.. 

deoxycytidine triphosphate deaminase 39.5% .. 
deoxyuridinc 5'triphosphate nuclcotidohydrolase 

aspartate carbamoyltransferase catalytic chain 42.0% . 

dihydroorotase 37.3% . 



Aq046 
Aql 305 

Aql 580 
Aql334 
Aq'13 
Aq640 
Aq969 
Aql907 
Aq2163 

Regulation 
Aql 058 
Aq2179 
Aq281 
Aq!387 
Aql 724 



hflX 
hkiPl 
hJoP: 
hksP3 
hkiP4 
hoxX 
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•hvpE 
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klR 
MR I 
l«R2 
merR 
niLA 

ntrC3' 
ntrC4 
obe 
phoB 
phol." 
spoT 
xvlR 



Aq534 
Aq831 
Aq490 
Aql 207 
Aqt418 
Aq2l3 
Aql908 
Aql 115 
Aq316 
Aq905 
Aq231 
Aql 156 
Aq093 
AqlOl* 
Aq672 
Aq764 
Aq638 
Aql 038 
Aq702 . 
Aq218 
Aqlll? 
Aql792 
Aq230 
Aql64 
Aq2069 
Aq319 
Aq906 
Aq844 
Aql 496 

DNA Replication and Repair 
Aq35S " " 

Aq322 
Aql 472 
Aq910 
Aql 008 
Aql4« 
Aql 882 
Aq932 
Aql 855 
Aql422 
Aql 693 
Aq980 
Aql 026 
Aq2057 
Aql4&4a 
Aq2174 
Aql 394 
Aq633 
Aql 578 
Aq308 
Aql 242 
Aql449 
Aq282 
Aql 72 
Aq4% 
Aql629 
Aq710 
Aql 495 
Aql 628 
Aql 967 
Aql610 
Aq2150 
Aq2053 
Aq2155 
Aq561 
Aql 478 
Aq793 
Aql 886 
Aq064 
Aq657 
Aql 159 



jn 7 r> dihvdroorouse dfhydrogenaie 50.5%.. . 
pvrDB dihydrooroiate dehydrogenase electron transfer 

subunit 34.7% ... 

pytF orotidine-5'-pho(prutedecarbox>liie 37.2%... 

pyrG CTP i>-ntheuse 57j%... 

pliH I'MP kinase 62.1%... 

tnv thjTmdybte synthjx complementing protein * 30.5% ... 

Irn k thymidilate kinase 35.1%... 

umpS uridine 3- monophosphate synthase 42.1%... 

uri p uracil phosphoribosyltransferase 42.0%... 

acrRl iranscriptiorul regulator (TetR.'.AcrR family) 34.1% ... 

tct%2 iran>criptional regulator (TetR/AcrR family! 31.0% ... 

acrR3 tr jnicripuonal regulator (TetR/AcrR family) 29,7% ... 

ar4 R tranvripiional regulator (ArtR family) 35.3% .. 
<wr transcriptional regulator 

iDegT/Dnrl/En-cl family) 34.1%.. 

draG ADP-ribos>1glycohydrolase 32.1%.. 

eXiB trans-regulatory protein ExsB 38.5%..- 

fnr tranacripiional regulator (Crp/Fnr family) 29.5% .. 

rucR) transcriptional regulator (FurR family) 37.9% .. 

njrR2 transcriptional regulator (FurR family) 34.6% .. 

Pll-like protein GlnBi 48.0% .. 

GTP-binding protein HflX 40.3% . . 

histidine kinase sensor protein 27.7% . , 

histidtne kinase sensor protein 28.1% .. 

histidine kinase sensor protein 23.6% .. 

histidine kinase sensor protein 28.2% 

hydrogenase regulation HoxX 46.7% .. - 

transcriptional regulator (H-T-H) 50.2% ..- 

h>idrogenave expression/formation protein 44.3% ..- 

transcriptional regulatory protein HypF 44.8% .. 

transcriptional regulator (IcIR family) 30.4% 

transcriptional regulator (LysR family) 32.8% .. 

tran>criptional regulator fL>*sR family) 28.9% 

transcriptional regulator (MerR family) 32.8% .. 

transcript iiinal regulator (NifA family) 42.8% .. 

transcriptional regulator (NtrC famih'l 41.0% .. 

. transcriptional regulator I NtrC family) 40.2% ..- 

' transcriptional regulator (NtrC family) 40.0% .. 

transcript ional regulator (NtrC family) 38.3% .. 

GTP-binding protein 54.9% .. 

transcriptional regulator (PhoB-like) 41.6% .. 

transcriptional regulator (PhoL'-like) 41.9% .. 

(pippGpp 3-pyrophosphohydrolase 47.2% ..- 

transcriptionaj regulator (NagOXylR family) 29.3% .. 



dinG 
dniA 
dnaB 
dnaC 
dnaE 
dnaG 
dr.aN 
dnaO 
dr.aX 
dpbF 
dpIF 
evrA 
g^rB 
helX 
him.\ 
ihtB 
lie 

mutL 
mutSl 
mutS2 
mutT 
mutYl 
mutY2 
muiY3 
nt"o 
nucl 
ozt 
pol 
poU 
radC 
recA 
recG 
red 
recN 
recR 
rep 
sbcD 
ssb 
topA 
topGl 
topG2 
uvrA 
uvrB 
uvrC 



ATP-dependent helicase (DinG family) 27.9% .. 

chromosome replication initiator protein DnaA 36.5% .. 

repticaiive DNA hdicase 40.3% .. 

DNA replication protein DnaC 26.4% .. 

DNA p<>lvmera<* 111 alpha subunit 41.9% .. 

DNA primase 39.8% .. 

DNA polymerase III beta chain 32.1% .. 

DNA polymery 111 epsilon subunit 40.0% . 

DNA polymerase II! gamma subunit 36.6% . 

DNA polymerase beta family 39.1% . 

N - terminus of phage SPO 1 DNA polymerase 37.3% . 

DNA cyrase A subunit 43.6% . 

gvrawB 55.2%. 

DNA helicase 49.7% . 

DNA binding protein HL' 40.2% . 

integration host factor beta subunit 35.8% .- 

DNA ligase t ATP dependent ; 50.8% . 

DNA liga>e (NAD dependent) 45.7% . 

DNA mismatch repair protein MutL 72.3% . 

DNA mismatch repair protein MutS 77.5% . 

DNA mismatch repair protein MutS 37.0% . 

8 -OXO-dGTPa.se domain (mutT domain) 46.3% . 

endonudea.se HI 53.6% . 

endonuclease III 51.8%. 

endor.uctease 111 43.4% . 

deuxyribonudease IV 39.0% . 

thermococcal nuclease homolog 36.4% . 

O-6-methylguarune-DNA-aIkMtransferase 36.9% . 

DNA polymerase I 3o' exo domain 43.2% . 

DNA polymerase I (Poll) 30.5% . 

DNA repair protein RadC 39.0% . 

recombination protein RecA 88.5% . 

ATP-dependent DNA helicase RecG 38.9% . 

single-strand-DNA-specific exonudease Reel 31,8% . 

recombination protein RecN 27.7%.- 

recombtnation protdn RecR 38.3% . 

ATP-dependent DNA helicase REP 33.4% . 

ATP-dependent dsDNA exonudease 29.9% . 

single stranded DNA-binding protdn 39.4% . 

topoi>omerase I 39.6% . 

reverse gyrase 4 1 .6% . 

re\rrse gvrase 35. 1 % . 

repair excision nuclease subunit A 61.0% . 

repair excision nuclease subunit B 53.9% . 

repair excision nuclease subunit C 32.5% . 



Aq686 
Aql 856 
Aq2126 

Transcription 

RNA polymerase and transcription factors 

Aq6l3 deaD ATP-dependent RNA helicase DeaD 42.3% 

Aq357a flgM anti sigma factor FlgM 20.6% 

Aql218 fliA RNA polymerase sigma factorFlLA 37.2% 

Aq259 nusA transcription termination NusA 45.4% 

Aql33 nusB transcription termination NusB 32.3% 

Aql931 nusG transcription antitermination protein NusG 46.3% 

Aq873 rho transcriptional terminator Rho 59.6% 

Aq070 rpoA RNA polymerase alpha subunit 40.4% 

Aql939 rpoB RNA polymerase beta subunii 44.0% 

Aql945 rpoC RNA polymerase beta prime subunit 46.9% 

Aql490 rpoD RNA polymerase sigma factor RpoD 41.6% 

Aq599 rpoN RNA polymerase jigma factor RpoN 30.6% 

Aq!452 rpoS RNA polymerase sigma factor RpoS 40.5% 



RNA modification 



Aql SI 6 
Aql 067 

Aq4ll 

Aql 1 58 

Aq221 

Aq894 

Aq946 

Aql 955 

Aq924 

Aql 661 

Aql308 

Aq841 



ksg.A dimeth>dadenosine transferase 36.1% 

miaA tRNA ddta-2-isopentenylpyrophosphate UPP) 

transferase 38.2% 

pcnBl poly A polymerase 28.5% 

pcnB2 poly A polymerase 33.9% 

phpA pohrtbonudeotide nudeotidvltransferase 45.0% 

que\ queuosine biosynthesis protein 46.9% 

mc RNase 111 35.8% 

mhB RN*« HI1 48.4% 

mpH RNase PH 64.0% 

spoU rRNA methytase SpoU 44.0% 

tgt queuine tRNA-nTwsvltjaniferase 52.6% 

trml N2.N2 -dim ethyl guanosine tRNA 



AqH89 
Aq749 
Aq705 
Aql890 
Aq2046 
Aq257 

Translation 

Aq2l31 

Aq247 

Aq46t 

Aq2l47a 

Aq346 



irmD 
truA 
truB 
UnR 
vacB 
«cA 

ftnt 
gatA 
gJtB 
gatC 
pth 



Aminoacyt iRNA synthetases 



Aql293 
Ao923 
Aql677 
Aql068 
Aq763 
Aql22l 
Aq945 
Aq2141 
Aql22 
AqlISS 
Aq305 
Aq35t 
Aql770 
Aql202 
Aql257 
Aq422 
Aq953 
Aql730 
Aq36S 
Aq298 
Aql667 
Aq992 
Aql75l 
Aql4l3 

Ribosomal Proteins 



alaS 
argS 
aspS 

cysS 
genX 
gltX 
glyQ 

«lys 

hisSI 
hisS: 
ileS 
leuS 
leuS' 
lysL' 
metG 
metG' 
pheS 
phcT 
proS 
serS 
thrS 
trpS 
tvrS 
valS 



Aql935 

AqOI3 

Aq009 

AqOll 

Aql652 

Aql649 

Aq2042 

Aql 936 

Aql933 

Aql937 

Aql877 

Aql 634 

Aql642 

AqOlS 

Aq069 

Aql648 

Aql934 

Aq952 

AqOlba 

Aq012 

Aql653 

Aql 644 

Aql 930a 

Aq792a 

Aql485 

Aq2007 

AqOI7 

Aq072 

Aql645 

Aq063 

Aql 832 

Aq734 

Aql65l 

Aql 878 

AqOOS 

Aq073 

Aq735 

Aql834 

Aq074 

Aql65la 

Aq226a 

AqI23 

Aq020 

Aq064a 

Aq015 

Aql 767 

Aq867a 



rplA 

rplB 

rplC 

rplD 

rplt 

rplF 

rpll 

rpl) 

rplK 

rp!L 

rplM 

rplN 

rplO 

rpIP 

rplQ 

rplR 

rp!S 

rplT 

rpIV 

rplW 

rplX 

rpmD 

rpmC- 

rpml 

rpsA 

rpsB 

rpsC 

rpsP 

rps£ 

rpsF 

rpsGl 

rpsC2 

rpsH 

rpsl 

r P s| 

rpsK 

rpsLl 

rpsL2 

rpsM 

rpsN 

rpsO 

rpsP 

rp*Q 

rpsR 

rpsS 

rpsT 

rpsU 



Translation factor* 



Aql364 
Aq2l 14 
Aq7l2 
AqOOl 
Aq075a 
Aq2Q32 
Aql777 
Aq876 
Aql840 
Aql033 
Aq7l5 
Aq005 
Aql92S 

Protein modification 



erp 

eif 

frr 

fusA 

inlA 

in IB 

in PC 

prlA 

prtB 

selB 



tufAl 
tufA2 



Aq73l 

AqS79 

Aq2093 

Aq055 

Aq2043 

Aql053 

Aq739 

AqlS71 

Aq2102 

Aq567 

Aq576 

Aql 52 

Proteases 
Aql950 
Aql 672 
Aql 296 
Aql 339 
Aql 337 
An I'M 5 



ccdA 
def 
dsbC 
hem XI 
hemXi 
nil5 1 
nifS2 
pmbA 
prmA 
riml 
stpK 
UpA 

aprV 
dpi* 
dpC 
dpP 



methyllransferase 

tRNA guanine-Nl methyltransferase 
pseudouridine synthase I 
iRNA pseudouridine 55 synthase 
rRNA methylase 

VacB protein (ribonudcase II family) 
RNA methyltransferase (TrmA-famiry) 

methianyl-lRNA formyttransferase 
glutamyl-tRNAtGIn) amidotransferase subunit A 
glutamyl-tRNA(Gln) amidotransferase subunit B 
glutamyl- tRNA(Gln) amidotransferase subunit C 
peptidyl-tRNA hydrolase 

alanyt-tRNA synthetase 
arginyl-tRNA synthetase 
aspartyt-tRNA synthetase 
cysteinyMRNA synthetase 
lysyl-lRNA synthetase (genX) homolog 
glutamyl-tRNA synthetase 
glycyl-tRNA synthetase alpha subunit 
glycyl-tRNA synthetase beta subunit 
histidvl-tRNA synthetase 
histidyt-tRNA synthetase 
isoleucyMRNA synthetase 
leucyl-tRXA synthetase alpha subunit 
leucvl-tRNA synthetase beta subunit 
Ivsyl-tRNA synthetase 
methionyl-tRNA synthetase alpha subunit 
methionyl-tRNA synthetase beta subunit 
phcnylalanyl-iRNA synthetase alpha subunit 
phenylalanyl-tRNA synthetase beta subunit 
proline-tRNA synthetase 
seryl-tRNA synthetase 
threonyl-lRNA synthetase 
tryptophanyl-tRNA synthetase 
tyrosyl 1 RNA synthetase 
valyl-tRNA synthetase 

ribosomal protein L01 
ribosomal protein L02 
ribosomal protein L03 
ribosomal protein L04 
ribosomal protein L05 
ribosomal protein L06 
ribosomal protein L09 
ribosomal protein L10 
ribosomal protein LI I 
ribosomal protein L7/LI2 
ribosomal protein LI3 
ribosomal protein L14 
ribosomal protein L15 • 
ribosomal protein L16 
ribosomal protein L17 
ribosomal protein L18 
ribosomal protein L19 
ribosomal protein L20 
ribosomal protein L22 
ribosomal protein L23 
ribosomal protein L24 
ribosomal protein L30 
ribosomal protein L33 
ribosomal protein L35 
ribosomal protein SOI 
ribosomal protein S02 
ribosomal protein S03 
ribosomal protein S04 
ribosomal protein S05 
ribosomal protein S06 
ribosomal protein S07 
ribosonul protdn $07 
ribosomal protein SOS 
ribosomal protein S09 
ribosomal protein S10 
ribosomal protein SI 1 
ribosomal protein $12 
ribosomal protein S12 
ribosomal protein SI 3 
ribosomal protein S)4 
ribosomal protein $15 
ribosonul protein S16 
ribosnma! protein S17 ' ^ 

ribosomal protein S18 
ribosomal protein $19 
ribosomal protein S20 
ribosomal protein S21 

elonjtation factor P 

initiation factor clF-2B alpha subunit 

ribosome rec>vling factor 

.elongation factor EF-G 

initiation factor IF- 1 

initiation factor IF-2 

initiation factor IF-3 

peptide chain release factor RF-I 

peptiJe chain release factor RF-2 

elongation faaor SclB 

elongation faaor EF-Ts 

elongation factor EF-Tu 

elongation Factor EF-Tu 

cytochrome c-type biogenesis protein 
polypeptide deformytase 
thiohdisulfide interchange protein 
cytivhrome c biogenesis protein 
cytochrome c biogenesis protein 
FeS duster formation protein NifS 
FeS cluster formation protein NitS 
peptide maturation 

ribosomal protein LI 1 methyltransferase 
ribosomal-protein-alanine acetyltransferase 
ser.'thr protein kinase 
thiol disulfide interchange protein 



serine protease 

ATTase subunit of ATP-dependent protease 
ATP-dcpendent Clp protease 
ATP-Jepcndeni Op protease proteolytic »ubunii 
ATP-dependent protease ATPase subunit dpX 
ColLicenase 



34.6% .... 
42.9% .... 
33.1%.... 
38.2% .... 
36.4% .... 
37.9% .... 
28.8% .... 

45.7% .... 
53.6% .... 
48.8% .... 
41.1%.... 
48.8% .... 

46.6% .... 
39.4% .... 
51.3%.... 
45.0% .... 
38.6%.... 
48.5% .... 
61.9%.... 
37.1%.... 
43.3% .... 
34.9% .... 
82.1%.. 
50.7% .... 
47.2%.... 
53.2% ... 
45.0% .... 
64.2% .... 
51.9%.... 
35.4%.... 
44.1%.... 
59.4% .... 
48.5% .... 
38.4%..., 
56.2% .... 
33.2% .... 

57.9%..., 
46.9%..., 
53.8% .... 
51.3% .... 
67.0%... 
46.2%... 
35.6%... 
36.5%... 
71.4% ... 
75.4% ... 
60.6% ... 
59.S% ... 
57.4% ... 
59.3% ... 
<8.7% ... 
62.7% ... 
59.8% ... 
63.5% ... 
47.3% ... 
52.2% ... 
30.8% ... 
46.4% .. 
67.9%... 
48.3%... 
32.6% ... 
60.3% ... 
54.0% ... 
51.9%... 
60.6% ... 
32.7%... 
52.5% ... 
51.9% ... 
39.9% ... 
50.5%... 
53.9%... 
60.7%.., 
78.9%.., 
78.9% .., 
61.9%.., 
51.6%... 
61.6% ... 
36.6%... 
59.6% .., 
48.5%.., 
63.1%.. 
40.0% .., 
38.2% .. 

48.6% .. 
58.4% .. 
43.0% .. 
91.9%.. 
69.1%.. 
43.5% .. 
53.6% .. 
54.8% .. 
49.9% .. 
30.4% .. 
35.8% .. 
74.4% .. 
73.9% .. 

32.0% .. 
41.4% .. 
27.6% .. 
26.2%.. 
36.2% .. 
38.5% .. 
45.5% .. 
25.6% .. 
35.1%.. 
37.9% .. 
30.8% .. 
37.6% .. 

26.5% .. 
46.8% .. 
54.9%.. 
65.4% .. 
6h.l%.. 
4l.3*o.. 



Aql67l 
Aql450 
Aq242 
Aq076 
AqH59 
Aq2099 
Aql535 
Aq618 
Aq797 
Aq552 
Aq2204 
Transport 
Aql222 
Aq620 
Aql095 
Aql094 
Aql097 
Aq4l7 
Aq4I3 
Aq297 
Aq2l60 
Aql53l 
Aq2122 
Aq2l37 
Aql563 
Aq695 
Aql 122 
Aq469 
Aq786 
Aql 12 
Aq6«2 
Aq343 
Aq85l 
Aq724 
Aql445 
Aql 125 
Aqll32 
Aqt331 
Aq468 
Aql073 
Aq9ll 
Aql062 
Aql 255 
Aql 330 
Aql .'68 
Aql863 
Aql725 
Aql229 
Aq447 
Aql609 
A0M6 
Aq4l5 
Aq929 
Aq2030 
Aq2l5 
Aql441 

Aq48l 
Aql509 
Aq2019 
Aql055 
Aq2018 
Aq20l6 
Aq2129 
Aq098 
Aq2077 
Aq2l06 
Aql988 
Aql504 
Aq03l 

Uncategoriied 
Aq!023 
Aq2110 
Aql 58 
Aq458 
Aq542 
Aqt47 
Aql 303a 
Aql 265 
Aq348 
Aq2l2 
Aq337 
Aq528 
Aql 48 
Aq2095 
Aql994 
Aql919 
Aql 540 
Aql052 
Aql657 
Aq944 
Aql 108 
Aql 458 

AqtOSb 
AqlOl 
Aq2l20 
Aql09l 
Aq708 
Aql 925 
Aql 579 
Aql 983 
Aq748 
Aql 739 
Aql977 
Aql 560 
Aql823 
Aql789 
Aq587 
Aql 820 
Aq896 
Aql 300 
Aql 507 
Aq967 
Aql4l 
Aq994 
Aq057 
Aq287 
Aq832 
.\q87l 
Aq2021 
Aq773 



hsIV heat shock protein HsLV 57.6% .... 

htr A periptasmic serine protease ^* 

Ion Lon protease so .6% Z!I 

map methionyl ammopeptidase 44.1% 

npr neutral protease 27.7% „*" 

pepA leucine aminopeptidase 39 j% .... 

pepQ xaa -pro di peptidase 31.9%.... 

prpl protease I 4U%..„ 

pre carboxyl-termtnal protease 41 jo^ mm 

sms ATP-dependent protease ims 46_2<^ .... 

ymxG processing protease - 283% .... 

abcTl ABC transporter 34.7% .... 

abcT2 ABC transporter 36^% .... 

abcT3 ABC transporter (AflC-2 siAtamily) 34.4* .... 

abcT4 ABC transporter 37.7% .... 

abcT5 ABC transporter (hh/B subfamily) 45^%..., 

abcT6 ABC transporter 51.8%.... 

abcT7 . ABC transporter 515% .... 

abcTS ' ABC transporter 49J% .... 

abcT9 ABC transporter 4SJ% .... 

abcTIO ABC transporter 36.4% .... 

abcT 1 1 ABC transporter 425% .... 

ibcTU ABC transporter 38.2%.... 

abcTlJ ABC transporter (MsbA subfamily) 30.5% .... 

iciDl cation efflux system (AcrB/AcrD/AcrF family) 22.7% .... 

acrD: cation efflux system (AcrB/AcrD/AcrF family) 32.0% .... 

a;rD3 cation efflux system (AcrB/AaD/AcrF family) 34 J% .... 

acrD4 cation efflux (AcrB/AcrD/AcrF family) 27.7% .... 

amiB ammonium transporter 49.0% .... 

arsAI anion transporting ATPase 41.5% 

arsA2 anion transporting ATPase 33.9% .... 

i0 T.\ Mgt2 + I and Co(2+) transport protein 31.1% ... 

nrAl cation transporting ATPase (E1-E2 family) 30.7% .... 

ctrA2 cation transporting ATPase (EI -E2 family) 28.1% .... 

ar.\3 cation transporting ATPase (E1-E2 family) 43.8% .... 

cicBI cation efflux system (czcB-like) 23.7%.... 

czeb: cation efflux s>stem (czcB-like) 26.9% .... 

ezcBJ cation efflux system (czcB-like) 28.5% .... 

cicD cation efflux system (CzcD-like) 43.4% .... 

ebs erythrocyte band 7 homolog 50.2%.... 

emrB major facilitator family transporter 28.3% .... 

(VoB ferrous iron transport protein B 32.6% .... 

ghp proton/sodium -glutamate symport protein 35.b% .... 

hvsT high affinity sulfate transporter 29.4°o.... 

kch potassium channel protein 30.1% ... 

ItfpA G -protein LepA 59.S% . . . . 

mttT transporter (major facilitator family) 37.2% .... 

mc:C Mgl 2+) transport ATPase 36^%..., 

mod.\ molybdate periplasmk binding protein 38.2% .... 

modC Molybdenum transport system permease 44.8% .... 

napAl Na(t)/H( + ) antiporter 27.6V .... 

napA2 Nat+)/H( + ) antiporter 317%..., 

napA3 Nat +)/H(+) antiporter 26.S% .... 

na>.\ nitrate transporter 35.8%... 

oppA transporter (extracellular solute binding 

protein family 5) 37.0% .... 

oppB transporter (OppBC family) 46.2% .... 

oppC • oligopeptide transport system permease 46.2%.... 

pstA phosphate transport s>*stem permease PstA 435% .... 

pstB phosphate transport ATP binding protein 68.1% .... 

pstC phosphate transport system permease protein C 45.2% 

psiS phosphate-binding periplasmic protein 514% .... 

sbt Na( + ) dependent transporter (Sbf family) 34.9% 

secG protein export membrane protein SecG 35.7% .... 

snt Nat +):neurotransmitter s>-mporter (Snf family) 25.7%... 

sst Na( + ):solute symportet {Ssf family) 47.4% . 

tolQ TolQ homolog 315% . 

trkl K+ transport protein homolog 40.6%... 

trnS transporter (Pho87 family) 46.8% ... 

atuCl acetoin utilization protein 36.9%... 

acuC2 acetoin utilization protein 38.6% .. . 

jpfA AP4A hydrolase 36.6%... 

bep bacterioferriiincomigratory protein 40.6%... 

bcpC phosphonopyruvate decarboxsHase 37.4% ... 

cob\V cobalamin synthesis related protein Cob\V 295% ... 

cspC cold shock protein 67.2% ... 

cst\ carbon starvation protein A 33.0%... 

etc general stress protein Ctc 34.7%... 

cyn$ cyanate hydrolase 39.5 _, t... 

cv-sQ C>-sQ protein 47.4%... 

dedF phenylacrylic acid decarboxylase 52.4% ... 

dtfi>C deoxyribose-phosphate aldolase 46.6-? ... 

6kiA dnaK suppressor protein J5 1^ "* 

era I GTP-binding protein Era 49.7*".... 

era: GTP binding protein Era 43.0% ... 

ccpEL GcpE protein 50.l c i ... 

gcsH I glycine cleavage system protein H 28.6% ... 

gcsH2 glycine cleavage system protein H 39.8% ... 

gcsH3 glycine cleavage system protein H 36.7% ... 

«*H4 gK-cinc cleavage system protein H 44.?% ... 

gcvT aminometh>itransferase 

(gljxine cleavage system T protein) t' Z? "" 

rifq host factor I 53.5"- ... 

h!v hemolysin 33.7%... 

hjiC hemolysin homolog protein "' 

hyl\ hemolysin 33.?"- ... 

h\MA N-methylhydantoinaseA 39^% ... 

hyuB N'-methylh>-dantoinase & 43.1%... 

ij'gB invasion protein lagB 385% ... 

impi myo-inositol- Kor 4J-monophosphatase ^> ^» »• 

bpA gcranylgerans'l pyrophosphate synthase 40. . . . 

ImB LytB protein' 43.9%... 

mai.\ enolase-phosphauseE-1 42.3%... 

mgUl gliding motility protein 42.4^, ... 

mglA2 gliding motility protein MglA 34.1% ... 

mviB 'virulence factor' homolog MviB 29.7% .. . 

neaC N-ethylammeline chlorohsdrolase 42Jt% — 

nfeD nodulation competitiveness protein NfeD 37.9"» ... 

nitV N'ifU protein '" 

omp outer membrane protein 255%-. 

omi O-mcthyttransferase 39.5% ... 

oit\ organic solvent tolerance protein 22.0^ ... 

pkcl protein kinase C inhibitor ( HIT family! 59.0% .. . 

pncA pyrazinamidase/nicotinamidase 39. 1^ . . . 

sugar fermentation stimulation protein 27.3**. ... 

v nib small protein B 52.V-. ... 

>ur E stationary phase survival protein SurE 44.!"*.... 

ihdF thiophene and furan oxidation protein 45.4--. ... 

tuil> Tldnpmtein tf." . ... 

t l^ hcmoUsin ... 



from CheA are transferred to CheY, which then binds to the flagellar 
switch, altering the direction of flagellar rotations-Homologous 
chemotaxis systems are present in " the archaea Halobacterium 
Mnarum 29 and Pyrococcus sp. OT3 (H. Sizuya, personal commu- 
nication), although the bacterial and archaeal flagellar apparatuses 
are not homologous 30 . The M. jannaschii genome also lacks homo- 
logues of known genes required for chemotaxis. Thus, either 
motility in A. aeolicus and M. jannaschii is undirected or input for 
controlling taxis is mediated through another, unidentified system. 
The most studied chemotaxis systems respond to sugars and amino 
acids, although responses to other inputs (for example, metals, 
redox potential, and light) may also occur. In contrast to all the 
organisms known to possess the classical chemotactic signal- 
transduction pathways, both A. aeolicus and M. jannaschii are 
obligate chemoautotrophs. Chemoautotrophs may respond to a 
different set of factors, such as concentrations of dissolved gas (C0 2 , 
H 2 or 0 2 ) or another critical parameter such as temperature. 

In R coli, the flageUar switch is essential for flagellar structure and 
{unction and coupling of chemotaxis signals. But the A. aeolicus 
genome encodes homologies of only two of the three E. coli 
proteins that make up the switch, FliG and FliN. Biochemical 31 
and genetic 32 studies implicate the missing FliM protein as the 
receptor for phosphorylated CheY, the switch signal. The absence of 
both FliM and CheY in A. aeolicus supports the identification of 
FliM as the receptor for phosphorylated CheY in E. coll This result 
also argues against a direct role for FliM in torque generation. 

DNA replication and repair 

.The A. aeolicus primary replicative DNA polymerase, corresponding 
to the DNA polymerase III holoenzyme in E. coli, probably consists 

.Figure 2 Histogram representation of the similarity of selected classes of 
predicted proteins to predicted proteins from the E coli (EC) and M. jannaschii 
.(MJ) genomes. Predicted A aeolicus proteins representing each category were 
independently compared to sets of all potential polypeptides (>100 amino acids) 
from the two genomes using FASTA*\ If the top scoring alignment covered >80% 
of the length of the A aeolicus protein, the score was plotted. There were more 
positives found in the E. coli genome in nearly every category. Hypothetical 
proteins (those identified by database match but of unknown function) are very 
similarly represented by M. jannaschii and E coli. There are a small number of 
very highly conserved hypotheticals that are shared between A aeolicus and 
M. jannaschii. Generally, biosynthetic categories show less discrimination than 
information-processing categories, which are clearly moreE co//-like. The varia- 
tion in the apparent rates of evolution in different categories suggests that 
different phytogenies may be inferred depending on the sequence analysed. 
Within each graph, correspondence to E. coli is shown in white and M. jannaschii 
is shown in black. Avg id, average identity; count, number of proteins analysed. 
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of a core structure containing a- and e-subunits, a 7-T-subunit and 
an additional member of the 7-T/8'-family. A gene encoding a 
protein homologous to the P-sliding clamp was also found. This 
minimalistic complex lacks homologous 6-, 8-, x _ an <* ^-subunits, 
as does the Mycoplasma genitalium holoenzyme 3 . Translation of the 
54K (relative molecular mass) 7-T-ATPase subunit may proceed 
without a programmed frameshift to produce a protein similar to 
the N-terminal region of the E. coli -y-subunit. DNA polymerase I is 
present as separate Klenow fragment and 5' — 3' exonuclease 
subunits, encoded by two non-adjacent ORFs. Although the 
repair polymerase, DNA polymerase II, has not been found in A, 
aeolicus, one ORF (Aql422) encodes a protein similar to the 
eukaryotic DNA repair polymerase-p. A member of the same 
family has been identified in Thermus aquaticus** and Bacillus 
subtilis. 

Transcriptional and translational apparatuses 

The transcriptional apparatus of A aeolicus is similar to that of E. 
colt and lacks any components specific to the Eukarya or Archaea 
(Fig. 2). In addition to the core RNA polymerase a-, and 3'- 
subunits, four a-factors which determine promoter specificity are 
present (Table 1). Several different families of bacterial transcrip- 
tional regulators were also identified, including two-component 
systems. All of the ribosomal proteins and elongation factors 
common to other bacteria are present, indicating that all bacteria- 
specific ribosomal proteins were present in the common ancestor of 
Aquifex and other bacteria. Also present are the four sel genes 
required for the cotranslational incorporation of selenocysteine. 
These latter genes are clustered in a 15-kilobase-pair segment that 
also encodes the biosynthetic and structural proteins for formate 
dehydrogenase, the only selenocysteine-containing protein identi- 
fied. The gene that encodes selenocysteine transfer RNA, selC, is 
apparently cotranscribed with the genes encoding the formate 
dehydrogenase structural proteins. 

A. aeolicus lacks glutaminyl-tRNA and asparaginyl-tRNA synthe- 
tases. The genes required for transamidation of glutamyl-tRNA Gb 
are present 34 . Charging of asparaginyl-tRNA is likely to proceed 
through the analogous reaction, as shown in halobacteria 35 , 
although the genes(s) for that transamidase are unknown. The 
canonical methionyl- and leucyl-tRNA synthetases have only been 
seen previously as single polypeptide enzymes; however, in A 



aeolicus the homologues appear fragmented into two subuiiL 
both cases, the genes that encode the N- and C-terminal portibE 
are widely separated on the chromosome. No complete - th&eS 
dimensional structural data are available for either metmorry0# 
leucyl-aminoacyl tRNA synthetases, but the subunit orgam^tipn in" 
the A aeolicus arninoacyi-tRNA 'synthetases may reflect, domain^ 
organization in the homologous proteins. . k ; *zi0 



Thermophlly 

The A. aeolicus genome is the second completely sequenced genomel 
of a hyperthermophile. By comparing the A. aeolicus and M3 
jannaschii genomes and contrasting them with the complete 1 
genomes of mesophiles, we can discover whether there are aspects ■ 
of the genome or the encoded information that are diagnostic of 
hyperthermophiles. The G + C content of the stable RNAs is clearly : 
indicative of the high growth temperature of the organism. This "1 
property can be used to identify stable RNAs against the relatively 
low G + C background of the A. aeolicus genome. The gene 
encoding tmRNA (or lOSa RNA) 36 , an RNA involved in tagging 
polypeptides translated from incomplete messenger RNAs for 
degradation, was located in this way. 

Two genes for reverse gyrase are present in the genome. This is the 
only protein known to be present only in thermophiles. Other 
proteins, currently described as hypothetical^ may be diagnostic of 
hyperthermophiles but the data sets are not yet large enough to 
decide this with confidence. 

Although features of stabilization may not be apparent in any 
given protein 37 , a large enough data set may reveal general trends in 
amino-acid usage that are informative. Particularly important in 
this regard is inclusion of multiple genomes of hyperthermophiles . 
so as not to allow the idiosyncracies of a single organism to bias the 
conclusions. As shown in Table 2, comparison of the amino-acid 
composition encoded by six genomes shows that use of individual 
amino acids can vary significantly from genome to genome. The 
data suggest trends that may be correlated with the thermostability . 
of the encoded proteins. One apparent trend is that the hyper- 
themophile genomes encode higher levels of charged amino acids 
on average than mesophile genomes 38 , primarily at the expense of 
uncharged polar residues. Glutamine in particular seems to be 
significantly discriminated against in the hyperthermophiles. 
Although this observation might be rationalized on the basis of 



Table 2 Comparison of relative amino acid compositions (in percentages) of mesophiles and thermophiles 



- "<r *^\<-au Mesophiles 



Thermophiles- 



Amino acid 

A 
C 
D 
E 



W 



H. influenzae 

8.21 
1.03 
4.98 
6.48 
. 4.46 
6.65 
2.05 
710 
6.32 
10.50 
2.44 
4.89 
3.72 
4.64 
4.47 
5.84 
5.20 
6.68 
1.12 
3.12 



H. pylori 

6.83 

1.09 

4.77 

6.88 

5.41 

5.76 

2.12 

7.20 
8.94 
11.18 
2.28 
5.83 
3.28 
3.70 
3.46 
6.81 
4.37 
5.59 
0.70 
3.68 



E. cofi 

9.55 
1.11 
5.20 
5.91 
3.87 
742 
2.26 
5.95 
4.48 
10.56 
2.86 
3.88 
4.41 
4.42 
5.58 
5.67 
5.35 
7.11 
1.48 
2.83 



Synechosystis 

9.07 
1.01 
5.07 
6.20 
3.75 
7.77 
1.93 
6.31 
4.26 
10.93 
2.12 
3.76 
5.09 
5.26 
5.18 
5.46 
5.53 
7.10 
1.30 
2.78 



A. aeolicus 

5.90' 

0.79 

4.32 

9.63 

5.13 

6.75 

1.54 

7.32 
9.40 
10.57 

1.92 
3.60 
4.07 
2.04 
4.91 
4.79 
4.21 
793 
0.93 
4.13 



M Jannaschii 

5.54 
1.27 
5.52 
' 8.67 
4.20 
6.41 
1.43 
10.45 
10.36 
9.38 
2.33 
5.24 
3.38 
1.44 
3.85 
4.46 
4.06 
6.85 
0.71 
4.33 



Mesophiles 



Thermophiles 



Charged residues (DEKRH) 
Polar/uncharged residues (GSTNQYC) 
Hydrophobic residues (LMIVWPAF) 



24.11 
31.15 
44.74 



29.84 
26.79 
43.36 



I an increased rate of deamidation of this residue at higher temperatures, 
I aspargine does not appear subject to similar discrimination. 

Phytogeny 

The placement of the Aquifex lineage as one of the earliest diver- 
gences in the eubacterial tree 13,14 is interesting because of the insights 
it could provide into the ancestral eubacterial phenotype, including 
the hypothesized thermophilic nature of the first bacteria. Protein- 
based phytogenies often do not support the original rRNA-based 
placement* 5 * 16,18 . Thus, the availability of some 1,500 genes from an 
Aquifex species Would seem to offer a definitive resolution of the 
phytogeny. However, our analyses of ribosomal proteins, amino- 
acyl-tRNA synthetases, and other proteins do not do so, showing no 
consistent pitfure of the organism's phytogeny. We cannot make a 
more complete analysis and discussion here, but some observations 
can be made. These proteins do not yield a statistically significant 
placement of the Aquifex lineage or of other major eubacterial 
lineages. This situation partially jeflects the inadequacy of some 
protein sequences as indicators of distant molecular genealogy 
because of their particular evolutionary dynamic, including the 
patterns and rates of amino-acid replacements. In some cases (such 
as the aminoacyl-tRNA synthetases for arginine, cysteine, histidine, 
proline and tyrosine), the analyses are further complicated by the 
presence of paralogous genes and/or apparent lateral gene transfers. 
It seems that a more extensive survey of genes and a better sampling 
of major eubacterial taxa will be required to confidently confirm or 
refute an early divergence of the Aquifex lineage. 

Conclusions 

Advances in sequencing techniques have allowed us to move beyond 
studies of single genes to studies of complete genomes only 
recently 2 . This rapid advance has created the opportunity to begin 
to characterize an organism with the full knowledge of the genome 
in hand. The complete genome summarized in this report repre- 
sents our first view of A. aeolicus. The challenge now is to ask specific 
questions in ways which take advantage of the whole-genome data. 

Beyond studies of any single organism in isolation, complete 
genomes allow comprehensive comparisons between organisms. 
For instance, comparisons of the similarity of genes can be made 
that reveal that genes in different categories vary in their relative 
conservation (Fig. 2). In addition, genome-wide trends are appar- 
ent. For example, why is there not more of a tendency to group 
functionally related genes (for example, biosynthetic pathways) into 
operons in A. aeolicus? This was also seen in the genome sequence of 
the autotroph M. jannaschii 1 . Is this because the autotrophic lifestyle 
decreases the need for selective regulation? There also seem to be a 
few multifunctional, fused proteins in A. aeolicus and M. jannaschii. 
Although this seems unlikely to be related to autotrophy, it might be 
associated with extreme thermophily. The large number of diverse 
genome sequences that will become available in the coming years 
■will allow more detailed correlation of global genomic properties 
with particular physiologies. □ 



Methods 

Sequencing strategy. The sequencing strategy used to assemble the complete 
..genome was based on the whole genome random (or 'shotgun') approach, 
which has been successfully used for other genomes of similar size 1 " 4 . Shotgun 
sequencing projects are characterized by two phases: an initial completely 
tTandom phase in which the bulk of the data is collected, followed by a closure 
-^has* where directed techniques are used to close gaps and complete the 
^ssanbly. By pursuing a strategy where only 97% coverage was initially 
w WCTe able to limit the number of sequences needed for the 
:craddom phase to only 10,500 (ref. 39). 

tM:^** 11 *** w?rc 8 cncrated from a small insert library constructed in X ZAP II 
^^^^ ^ rage ins ? rt ,cn Sth 2.9 kilobase pairs). Two different methods 
^^^^^'^^^•^ ^"Primer M13-21 and M13 reverse primer 
!™*4y reaction kits, analysed on 48-cm 4% poryacrylamide 



gels; and second, dye-terminator (ABI Prism FS+) reactions using two 
pBluescript-specific primers. These reactions were analysed on 36-cm 5% 
Long-Ranger gels. 

The sequence fragments were assembled on an Apple Power Macintosh 
computer using Sequencher (Gene Codes, Ann Arbor, MI), an assembly and 
editing program. Assembly was typically performed in batches of roughly 200- 
400 sequences, and was followed by inspection and editing of the assemblies. All 
sequences in the set were compared with all others through this process. After 
assembly, the sequences comprised -750 contigs at the end of the random 
phase. Sequences were obtained from both ends of —200 randomly chosen 
clones from a fosmid library 42,43 . These sequences were then assembled with 
consensus sequences derived from the contigs of random-phase sequences 
using Sequencher. Gaps between contigs were closed by direct sequencing on 
fosmids not wholly contained within a contig. The fosmid library thus served a 
purpose analogous to that of the \ -scaffold in other projects 1 " 4 . The final eight 
gaps were closed by direct sequencing of polymerase chain reaction (PCR) 
products generated with the TaqPlus Long PCR System (Stratagene Cloning 
Systems, La Jolla, CA). 

Consequences of reducing the number of sequences in the random phase are 
the large number o^gaps that remain to be closed in the directed phase, and the 
reduction in ovefalFfcoverage,. To ensure that reduced coverage did not 
compromise accuracy, -200 oligonucleotide primers were synthesized to 
resequence regions of ambiguity identified by visual inspection of the entire 
assembly. 13,785 sequences, with an average edited read length of 557 base 
pairs, constitute the final assembly. On the basis of a relatively small number of 
errors identified during the annotation process, we estimate the error frequency 
to be <0.01%, comparable to other published genomic sequence estimates. 
Gene (ORF + RNA) identification and functional assignment approaches. 
Coding regions of the A. aeolicus genome were analysed and assigned using 
primarily the programs BLASTP" and FASTA 45 to search against a non- 
redundant protein database. Many analyses were carried out within the context 
of MAGPIE 46,47 , an integrated computing environment for genome analysis. 
The results of these analyses are available for user interpretation, validation, 
and categorization. Additional ORFs were identified and start sites refined 
using the program CRITICA (J. H. Badger and G.J.O., unpublished program). 
Finally, all presumed 'intergenic regions' were examined with BLASTX for 
similarities to known protein. sequences 48 . Transfer RNA genes were identified j 
with the program tRNAscan-SE 49 . 
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WARNING: These microbial genomes from are not yet finished, and are not 
yet in GenBank and are not presently distributed to EMBL or DDBJ 
Please see details 



NOTE: This WWW-BLAST page utilizes NCBI • s new gapped BLAST algorithm 

( Altschul et al., 1997 ) with the BLASTN, TBLASTN, and TBLASTX programs. 

Commencing search, please wait for results. 



TBLASTN 2.0.8 [ Jan-05-1999] 
Reference t 

Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer 
Jmghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997) 
"Gapped BLAST and PSI -BLAST: a new generation of protein database search 
programs", Nucleic Acids Res. 25:3389-3402. 

Query* deltaprime . ecoli 
(334 letters) 

Searching , 

done 

If you have any problems or questions with the results of this search 
please refer to the BLAST FAQ a 

Sequences producing significant alignments: '(bits) Value 

gb | AE000657 | AE000657 Aquif ex aeolicus complete genome _£8 8e-13 

gb|AE000657|AE000657 Aquifex aeolicus complete genome 
Length = 1551335 

Score =67.5 bits (162), Expect = 8e-13 

Identities = 39/136 (28%), Positives = 58/136 (41%) 

Frame = +1 

Query: 25 HALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKG 84 

^ L G+G + L++ L C+ P + CG C C+ + G PD + 

Sb D ct: 1303996 HAYLFAGPRGVGKTTIARILAKALNCKNPSKGEPCGECENCREIDRGVFPDLIEMDAASN 1304175 

Query: 85 ™ TL GVDAV*EVTEKLNEHARLGGAKVVW^ 144 

+ G+D + E +N G KV + EE pp TFT 

Sbjct: 1304176 R---GIDDV^LKEAVmKPIKGKYKVYIIDEAHMLTKEAFNALLKTLEEPPPRTVFVLC 1304346 

Query: 145 TREPERLLATLRSRCR 160 

T E +++L T+ SRC+ 

Sbjct: 1304347 TTEYDKILPTILSRCQ 1304394 




Res 




WARNING: These microbial genomes from are not yet finished, and are not 
yet in GenBank and are not presently distributed to EMBL or DDBJ. 
Please see details 



NOTE: 



This WWW-BLAST page utilizes NCBI ' s new gapped BLAST algorithm 
( Altschul et al. , 1997 ) with the BLASTN, TBLASTN, and TBLASTX programs 



Commencing search, please wait for results. 



^5rtu e ? r< £ e , < ? a dat » ba s e generously provided by the Institute for Genomic Research 
(TIGR). Their Policy on Early Data Release is: 

The Institute for Genomic Research (TIGR) releases data very rapidly to ensure that our scientific colleagues have access to 
information that may assist them in the search for genes and their biological function. Data releases do not constitute scientific 
publication, but rather provide investigators with information that may "jump-start" biological experimentation. Users of this 
information are encouraged to share their results with TIGR in order to improve annotation of the sequence data Data or 
information may contain errors or be incomplete and should be regarded as preliminary. 

TIGR asks that you acknowledge the source of information obtained from this site in any publication by including the following 
sentence in both the Materials and Methods and Acknowledgement sections: "Preliminary sequence data was obtained from The 
Institute for Genomic Research website at http;//www,tigr.or f " Also include the following text in the Acknowledgements if 
applicable: Sequencing of [organism name] was accomplished with support from [funding agency]." The name of the funding 
agency for each TIGR proiect can be found at httn://www.tipr nrp / tdb/mdh/mHh html e 

Similarly, if you display this data or any information derived from it on a Web page, we ask that you prominently display the 
followmg notice on that webpage: "Preliminary sequence data was obtained from The Institute for Genomic Research website at 
http://www.tigr.org We request that you notify us of your electronic presentation by sending email to www@tigr org 



TBLASTN 2.0.8 [ Jan-05-1999] 



Reference ; 

Altschul, Stephen F. , Thomas L. Madden, Alejandro A. Schaffer 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997) 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search 
programs", Nucleic Acids Res. 25:3389-3402. 

Query= deltaprime . ecoli 
(334 letters) 



Searching. 



.done 



If you have any problems or questions with the results of this search 
please refer to the BLAST faoa searcn 



Sequences producing significant alignments: 



Score E 
(bits) Value 



gb|AE000657 |AE000657 Aquifex aeolicus complete genome 
gb|AE000783 |AE000783 Borrelia burgdorferi complete genome 



_68 le-12 
_47 2e-06 



gb|AE000657 |AE000657 Aquifex aeolicus complete genome 
Length = 1551335 

Score =67.5 bits (162), Expect = le-12 

Identities = 39/136 (28%), Positives = 58/136 (41%) 

Frame = +1 

Query: 25 HALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKG 84 

HA L G+G + L++ L C+ P + CG C C+ + G PD + 

Sbjct: 1303996 HAYLFAGPRGVGKTTIARILAKALNCKNPSKGEPCGECENCREIDRGVFPDLIEMDAASN 1304175 

Query: 85 KNTLGVDAVREVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXXXXXEEPPAETW^ 144 

+ G+D VR + E +N G KV + EEPP T F L 

Sbjct: 1304176 R GIDDVRALKEAVTTV'KPIKGKYKvTIIDEAHMLTKEAFNALLKTLEEPPPRTVFVLC 1304346 

Query: 145 TREPERLLATLRSRCR 160 

T E + + + L T+ SRC+ 
Sbjct: 1304347 TTEYDKILPTILSRCQ 1304394 



Score =43.0 bits (99), Expect = 4e-05 

Identities = 35/132 (26%), Positives = 56/132 (41%), Gaps = 28/132 (21%) 
Frame = +3 

Query: 27 LLIQALPGMGDDALIYALSRYLLCQQ--PQGHKSCGHCRGCQLMQA 70 

LL G G + + + +LC++ P G SC C+ + + 

Sbjct: 1082652 LLFYGKEGSGKTKTAFEFAKGILCKEWPWGCGSCPSCKHVNELEEAFFKGEIEDFKVYK 1082831 

Query: 71 GTHPDYYTLAPEKGKNTLGVmv^EVTEKLN^ 11 8 

G HPD+ + P + + + + +REV L KV+ + 

Sbjct: 1082832 DKDGKKHFVYLMGEHPDFWIIPSG— HYIKIEQIREVKNFAYVKPALSRRKVIIIDDAH 1083005 

Query: 119 XXXXXXXXXXXXXXEEPPAETWFFLATREPERLLATLRSR 158 

EEPPA+T F L T +L T+ SR 

Sbjct: 1083006 AMTSQAANALLKVLEEPPADTTFILTTNRRSAILPTILSR 1083125 

Score = 26.2 bits (56), Expect = 3.9 

Identities = 11/28 (39%), Positives = 15/28 (53%) 

Frame - -3 

Query: 32 LPGMGDDALIYALSRYLLCQQPQGHKSC 59 

LPG G+D +Y L+ Y + HK C 
Sbjct: 1283214 LPGSGEDFKVYFLTVYRNLTEEHFHKEC 1283131 

Score =25.1 bits (53), Expect =8.7 

Identities = 15/45 ,(33%), Positives = 21/45 (46%) 

Frame = +3 

Query: 285 RLQAILGDVCHIREQLMSVTGINRELLITDLLLRIEHYLQPGWL 329 

R+ +L D HIR LM +TGI +L + + H G L 
Sbjct: 120624 RVAVLLLDRKHIRYFLMDITGIEEKLDFLEPMTTRAHRFHSGGAL 120758 

gb|AE000783|AE000783 Borrelia burgdorferi complete genome 
Length = 910724 



Following those BLAST hits is the sequence of the contig containing the top 
hit. 



TBLASTN 2 .0al9MP-WashU [14-Jul-1998] [Build linux-x86 18:51:45 30-Jul-1998] 
Reference: Gish, Warren (1994-1997). unpublished. 

Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. 
Lipman (1990). Basic local alignment search tool. J. Mol. Biol. 215:403-10. 

Notice: statistical significance is estimated under the assumption that the 
equivalent of one complete reading frame of the database codes for protein and 
that significant alignments will involve only coding reading frames. 

Query= delta prime 

(334 letters) 

Database : /usr/ local /db/t_maritima 

948 sequences; 2,352,161 total letters. 
Searching. ...10. ...20. ...30. ...40. ...50. ...60. ...70 80. ...90. ...100% done 

Smallest 
Sum 





Reading 


High 


Probability 


Sequences producing High-scoring Segment Pairs: 


Frame 


Score 


P(N) N 


tm_2 6 


-2 


204 


3.7e-15 1 


tm_804 


+ 3 


158 


2.2e-10 1 


tm__19 


-1 


133 


3.4e-07 1 


tm_199 


+1 


64 


0.9999 1 



>tm_26 

Length = 18,920 

Minus Strand HSPs : 

Score = 204 (71.8 bits), Expect = 3.7e-15, P = 3.7e-15 
Identities = 56/202 (27%), Positives = 95/202 (47%), Frame = -2 

Query: 14 LVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTH 7 3 

+ + + Q H + GG L L++ L C+ +G + C CR C+ + GT 

Sbjct: 5536 IIGAIQKNSVAHGYIFAGPRGTGKTTLARILAKSLNCENRKGVEPCNSCRACREIDEGTF 5357 

Query: 74 PDYYTLAPEKGKNTLGV13AvT*EVTEKL^ 13 3 

D L + N G+D +R + + + QKV+ + +LT A NALLKTLE 

Sbjct: 5356 MDVIEL.--DAASNR-GIDEIRRIRDAVGYRPMEGKYKVYIIDEVHMLTKEAFNALLKTLE 5186 

Query: 134 EPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWL SREVTMSQDALL 188 

EPP+ F LAT E++ T+ SRC++ P++ L + + + ++AL 

Sbjct: 5185 EPPSHWFVLATTNLEKVPPTIISRCQVFEFRNIPDELIEKRLQEVAEAEGIEIDREALS 5006 

Query: 189 AALRLS AGS PGAALALFQGDNWQARE 214 

+ ++G AL + + W+ E 
Sbjct: 5005 FIAKRASGGLRDALTMLE-QVWKFSE 4931 



>tm_804 

Length = 1007 
Plus Strand HSPs : 



Score = 158 (55.6 bits), Expect = 2.26-10, P = 2.2e-10 
Identities = 41/143 (28%), Positives = 65/143 (45%), Frame = +3 



Query : 


14 


LVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTH 


73 






+ + + Q H + G G+ L L++ L C+ G C CR C + GT 




Sbjct: 


249 


IIGAIQKNNVAHGYIFAGPRGTGNTTLAIILAKSLNCENRSGVDPCNSCRACIEIDEGTF 


428 


Query : 


74 


PDYYTLAPEKGKNTLGVDAVREWEKLN 


133 






D L + N G+D +R + + + G KV + + LT A NALLK +E 




Sbjct : 


429 


MDVIQL--DAASNR-GIDEIRRIRDAVGYKPMEGKYKVYIID*VHMLTMEAFNALLKAVE 


599 


Query : 


134 


EPPAETWFFLATRE PERLLATL 155 








EPP+ F L T E P ++++ + 




Sbjct: 


600 


EPPSHVMFVLVTSEL*NGPRKIISNM 677 





>tm_19 

Length = 24,312 

Minus Strand HSPs : 

Score = 133 (46.8 bits), Expect = 3.4e-07, P = 3.4e-07 
Identities = 36/97 (37%), Positives = 50/97 (51%), Frame = -1 

Query: 75 DYYTLAPEKGKNTLGVDAWEVTEKLNEHARLGGAKVVWVTDAALLTDAAANALLKTLEE 134 

D + PE G+N +G+D +R + + LN L K V V D +T AANA LK LEE 
Sbjct: 14943 DVLEIDPE-GEN-IGIDDIRTIKDFLNYSPELYTRKYVIVHDCERMTQQAANAFLKALEE 14770 

Query: 135 PPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQY 171 

PP L TR LL T++SR + P+++ 

Sbjct: 14769 PPEYAVIVLNTRRWHYLLPTIKSRV-FRWVNVPKEF 14662 



>tm_199 

Length = 1128 

Plus Strand HSPs: 

Score = 64 (22.5 bits), Expect =8.9, P = 1.00 

Identities = 21/85 (24%), Positives = 40/85 (47%), Frame = +1 

Query: 134 EPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWL SREVTMSQDALL 188 

EPP+ F LAT E++ T+ SRC++ P+ + L + + + ++AL 

Sbjct: 1 EPPSHWFVLATTNLEKVPPTIISRCQVFEFRNIPDELIEKRLQEVAEAEGIEIDREALS 180 

Query: 189 AALRLS AGS PGAALALFQGDNWQARE 214 

+ ++G AL + + W+ E 
Sbjct: 181 FIAKRASGGLRDALTMLE-QVWKFSE 255 



Parameters : 
B=5 



ctxf actor=6 . 00 
E=10 
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The complete genome sequence 
of the gastric pathogen 

Helicobacter pylori 

Jean-F. Tomb*, Owen White , Anthony R. Kerlavage*, Rebecca A. Clayton*, Granger G. Sutton*, 
'Robert D- Fleischmann*, Karen A. Ketchum*, Hans Peter Klenk*, Steven Gill*, Brian A. Dougherty*, 
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Helicobacter pylori, strain 26695, has a circular genome of 1,667,867 base pairs and 1,590 predicted coding 
sequences. Sequence analysis indicates that H. pylori has well-developed systems for motility, for scavenging iron, 
and for DNA restriction and modification. Many putative adhesins, lipoproteins and other outer membrane proteins 
were Identified, underscoring the potential complexity of host-pathogen interaction. Based on the large number of 
[sequence-related genes encoding outer membrane proteins and the presence of homopolymeric tracts and 
^nucleotide repeats in coding sequences, H. pylori, like several other mucosal pathogens, probably uses 
recombination and slipped-strand mispairing within repeats as mechanisms for antigenic variation and adaptive 
evolution. Consistent with its restricted niche, H. pylori has a few regulatory networks, and a limited metabolic 
repertoire and biosynthetic capacity. Its survival in acid conditions depends, in part, on its ability to establish a positive 
^Inside-membrane potential in low pH. 




Tor most of this century the cause of peptic ulcer disease was 
thought to be stress-related and the disease to be prevalent in 
hyperacid producers. The discovery 1 that Helicobacter pylori was 
^associated with gastric inflammation and peptic ulcer disease was 
•uiitially met with scepticism. However, this discovery and sub- 
sequent studies on H. pylori have revolutionized our view of the 
gastric environment, the diseases associated with it, and the 
appropriate treatment regimens 2 . 

ll Helicobacter pylori is a micro-aerophilic, Gram-negative, slow- 
growing, spiral-shaped and flagellated organism. Its most charac- 
teristic enzyme is a potent multisubunit urease 3 that is crucial for its 
survival at acidic pH and for its successful colonization of the gastric 
environment, a site that few other microbes can colonize 2 . H. pylori 
is probably, the most common chronic bacterial infection of 
•humans, present in almost half of the world population 2 . The 
presence of the bacterium in the gastric mucosa is associated with 
.chronic active gastritis and is implicated in more severe gastric 
peases, including chronic atrophic gastritis (a precursor of gastric 
Carcinomas), peptic ulceration and mucosa-associated lymphoid 
-tissue lymphomas 2 . Disease outcome depends on many factors, 
■^eluding bacterial genotype, and host physiology, genotype and 
'(lietary habits 4 * 5 . H. pylori infection has also been associated with 
•persistent diarrhoea and increased susceptibility to other infectious 
diseases 6 . 

^Because of its importance as a human pathogen, our interest in its 
j?!°logy and evolution, and the value of complete genome sequence 
formation for drug discovery and vaccine development, we have 
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Table 1 Genome features 



General 

Coding regions (91.0%) 
.Stable RNA (07%) 
Non-coding repeats (2.3%) 
Intergenic sequence (6.0%) 

RNA 

Ribosomal RNA 
23S-5S 
23S-5S 
16S 
16S 
5S 

Transfer RNA 
36 species (7 clusters, 12 single genes) 

Structural RNA 

1 species (ssrD) 
DNA 

Insertion sequences 

15605 13 copies (5 full-length, 8 partial) 

15606 4 copies (2 full-length, 2 partial) 

Distinct G + C regions 
region 1 (33% G + C) 452-479 kb 
region 2 (35% G + C) 539-579 kb 
region 3 (33% G + C) 1,049-1,071 kb 
region 4 (43% G + C) 1.264-1,276 kb 
region 5 (33% G + C) 1,590-1,602 kb 

Coding sequences 
1,590 coding sequences (average 945 bp) 
1,091 identified database match 
499 no database match 



Coordinates 
445,306-448,642 bp 
1.473,557-1,473,919 bp 
1 t 209,082-1,207,584bp 
1,511,138-1,512,635 bp 
448,041 -448,618 bp 



629,845-630,124bp 



Associated genes 

IS605, 5SRNA and repeat 7; virB4 

cag PAI (Fig. 4) 

IS605, 5SRNA and repeat 7 

0 and p' RNA polymerase, EF-G (fusA) 

two restriction/modification systems 
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sequenced the genome of a representative H. pylori strain by the 
whole-genome random sequencing method as described for 
Haemophilus influenzae 7 , Mycoplasma genitalium* and 
Methanococcus jannaschi?. 

General features of the genome 

Genome analysis. The genome of H. pylori strain 26695 consists of a 
circular chromosome with a size of 1,667,867 base pairs (bp) and 
average G + C content of 39% (Figs 1 and 2). Five regions within the 
genome have a significandy different G + C composition (Table 1 
and Fig. 1). Two of them contain one or more copies of the insertion 
sequence IS605 (see below) and are flanked by a 5S ribosomal RNA 
sequence at one end and a 521 bp repeat (repeat 7) near the other. 
These two regions are also notable because they contain genes 
involved in DNA processing and one contains 2 orthologues of 
the virB4/ptl gene, the product of which is required for the transfer 
of oncogenic T-DNA of Agrobacterium and the secretion of the 
pertussis toxin by Bordetella pertussis 10 . Another region is the cag 
pathogenicity island (PAI), which is flanked by 3 1-bp direct repeats, 
and appears to be the product of lateral transfer 11 . 
RNA and repeat elements. Thirty-six tRNA species were identified 
using tRNAscan-SE 12 . These are organized into 7 clusters plus 12 
single genes. Two separate sets of 23S-5S and 16S ribosomal RNA 
(rRNA) genes were identified, along with one orphan 5S gene and 
one structural RNA gene (Table 1). Associated with each of the two 
23S-5S gene clusters is a 6-kilobase (kb) repeat containing a 
possible operon of 5 ORFs that have no database matches. 

Eight repeat families (>97% identity) varying in length from 0.47 
to 3.8 kb were found in the chromosome (Figs 1 and 2). Members of 
repeat 7 are found in intergenic regions, while the others are 
associated with coding sequences and may represent gene duplica- 
tions. Repeats 1, 2, 3 and 6 are associated with genes that encode 
outer-membrane proteins (OMP) (Fig. 3). 

Two distinct insertion sequence (IS) elements are present. There 
are five full-length copies of the previously described IS605 11,13 and 
two of a newly discovered element designated IS606. In addition, 
there are eight partial copies of IS605 and two partial copies of 
IS606. Both elements encode two divergendy transcribed transpo- 
sases (TnpA and TnpB) . IS606 has less than 50% nucleotide identity 
with IS605 and the IS606 transposases have 29% amino-acid 
identity with their IS605 counterpart. Both copies of the IS606 
TnpB may be non-functional owing to frameshifts. 
Origin of replication. As a typical eubacterial origin of replication 
was not identified 14 , we arbitrarily designated basepair one at the 
start of a 7-mer repeat, (AGTGATT) 26 , that produces translation^ 
stops in all reading frames, as this repeated DNA is unlikely to 
contain any coding sequence. 

Open reading frames. One thousand five hundred and ninety 
predicted coding sequences were identified. They were searched 
against a non-redundant protein database resulting in 1,091 puta- 
tive identifications that were assigned biological roles using a 
classification system adapted from Riley 15 (Table 2). The 1,590 
predicted genes had an average size of 945 bp, similar to that 
observed in other prokaryotes 7 " 9 , and no genome-wide strand 
bias was observed (Fig. 2). More than 70% of the predicted proteins 
in H. pylori have a calculated isoelectric point (pi) greater than 7.0, 
compared to —40% in H. influenzae and E. coll The basic amino 
acids, arginine and lysine, occur twice as frequently in H. pylori 
proteins as in those of H. influenzae and E. coli, perhaps reflecting an 
adaptation of H. pylori to gastric acidity. 

Paralagous families. Ninety-five paralogous gene families com- 
prising 266 gene products (16% of the total) were identified 
(www.tigr.org/tdb/mdb/hpdb/hpdb.html). Of these, 67 (173 
proteins) have an assigned role. Sixty-four have only 2 members, 
while the porin/adhesin-like outer membrane protein family (Fig. 2) 
is the largest with 32 members. The largest number of paralogues 
with assigned roles fall into the functional categories of cell 



envelope, transport and binding proteins, and proteins invoF 
in replication. The large number of cell envelope proteins iJ 
reflect either a reduced biosynthetic capacity or a need to adab 
the challenging gastric environment. P 

Cell division and protein secretion 

The gene content of H. pylori suggests that the basic mechanism 
replication, cell division and secretion are similar to those of £ 
and H. influenzae. However, important differences are noted.' \ 
example, apparently missing from the H. pylori genome are orifl 
Iogues of DnaC, MinC, and the secretory chaperonin, SecB. In or? 
type primosome formation, the DnaB and DnaC proteins form af 
C complex that delivers the DnaB helicase to the develop} 
primosome complex 16 . The apparent absence of DnaC in H.pJl 
suggests that either a novel mechanism for recruiting DnaB exista 
a DnaC orthologue with no detectable sequence similarity 
present. Similar arguments can be made for other seemingly missfl 
important functions. J 

H. pylori has a classical set of bacterial chaperones (DnaK, m 
CbpA, GrpE, GroEL, GroES, and HtpG). The transcriptiqf 
regulation of H. pylori chaperone genes is likely to be differf 
from that in E. coli, as it seems not to have the sigma factors \ 
upregulate chaperone synthesis in E. coli (heat-shock sigma 32 * 
stationary-phase sigma S). 4 

In addition to the SecA-dependent secretory pathway, H. py 
has two specialized export systems. One is associated with the 1 ! i 
pathogenicity island 11 and the other is the flagellar export path* ; 
which is assembled from orthologues of FliH, Flil, FliP, FlhA, FB 
FliQ, FliR and FliP 17 . Apparently absent from H. pylori is a type* 
signal peptidase and orthologues of the dsbABC system, whicfi 
other species are required for the maturation of pUi and piling i 
structures 18 and assembly of surface structures involved in virulej 
and DNA transformation 19 . 

Recombination, repair and restriction systems 

Systems for homologous recombination and post-replication,ji 
match, excision and transcription-coupled repair appear id 
present in H. pylori Also present are genes with similarity 
DNA glycosylases which have associated AP endonuclease activ 
The RecBCD pathway, which mediates homologous recombina! 
and double-strand break repair, and RecT and RecE orthologS 
proteins involved in strand exchange.during recombination 20 ,^ 
to be absent. The ability of H. pylori to perform mismatch repai 
suggested by the presence of methyl transferases, mutS and uyll 
However, orthologues of MutH and MutL were not identiS 
Components of an SOS system also appear to be absent. ~M 
Bacteria commonly use restriction and modification system ' 
degrade foreign DNA. In H. pylori, this defence system isjj i 
developed with eleven restriction-modification systems ideritdE 
on the basis of gene order and similarity to endonucleases, meffi 
transferases, and specificity subunits. Three type I, one type 
three type IIS systems were identified, as well as four, type* 
systems, including the recently identified epithelial resportf 1 



Figure 1 Linear representation of the H. pylori 26695 chromosome illustratii 
location of each predicted protein-coding region, RNA gene, and repeat elertjj 
in the genome. Symbols are as follows: ++, Co 2 *, Zn 2 *, Cd 2 *; ?, unknown; 
D-alanine/glycine/o-serine; B12, Bl2/ferric siderophores; E t glutamate] 
molybdenum; P, proline; P/G, proline/glycine betaine; Q, glutamii 
serine; a-k. a-ketoglutarate; a/o, arginine/orntthine; aa. amino acids {spa* 
unknown); aa2, dipeptides; aaX, oligopeptides; fum, fumarate, succinate 
glucose/galactose; h, hemin; lac, L-lactate; mal, malate 2-oxoglutaraw, 
nicotinamide mononucleotides; pyr, pyrimidine nucleosides. Numbers' 
ciated with tRNA symbols represent the number of tRNAs at a locus. N 
associated with GES represent the number of membrane-spanning <J, 
according to the Goldman, Engelman and Steitz scale as calculated byTo£ 
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AMlNO-AOO BIOSYNTHESIS 
Genera) 

HPC695 hydantoin utilization protein A (hyuA) 28.6% 
Aromatic amino-acid tamity 

HP1Q38 3-dehydroquinsse type fl (aroQ) 99,4% 

HP0283 3-dehydroquinate synthase (aroB) 331% 
HP0134 3^ecxy-D-aratwno-heptulosonate 

7-phosphate synthase (dhsl) 54 6% 
HPO+Ol ^phosphosrtkimaio 

1-carboxyvinytiransf erase (aroA) 636% 

HP1279 anthranilate isomerase (trpC) 470% 

HPI282 anthranilate synthase component I (trpE) 479% 

HP1280 anthranilate synthase component II (trpO) 42.5% 

HPl28t anthranilate synthase component II (trpD) 40.2% 

HP0663 chorismate synthase (aroC) 472% 

HP 1380 prephenate dehydrogenase (tyrA) 30.2% 

HP1249 shikimate dehydrogenase (aroE) 36 6% 

HP0157 shikirrtfc acid kinase 1 (aroK) 36.)% 

HP1277 tryptophan synthase, alpha subunit (trpA) 46 5% 

HP1278 tryptophan synthase, beta subunit (trpfi) 66.1% 
Aspartate family 

HP0649 aspartate ammonia-ryase (aspA) 55 5% 
HPJ189 aspartate-semialdehyde dehydrogenase 

fasd) 457% 
HP 1229 aspartofcinase (tysC) 48.0% 

HP0106 cystathionine gamma-synthase (metS) 47.7% 
HP0290 diaminopimetate decarboxylase 

(dap decarboxylase) {h/sA) 42 7% 

HP0566 dtaminopimelate epimerase (dapF) 30 0% 

HP0510 dihydrodipicolinate reductase (dapB) 95.3% 

HP1013 dihydrodipicolinate synthetase (dapA) 39.5% 

HP0622 homosefine dehydrogenase (metL) 377% 

HP1050 homoserine kinase (thrB) 27.7% 
HP0672 solute-binding signature and mitochondrial 

UD «., sierra Protein (aspB) 473% 
HP0212 succinyWiaminopimelate desuccinylase 

(dapE) 42J% 
HP0626 tetrahydrodiptcolinate N-succinyttransferase 

(dapD) gg,^ 

HP0098 threonine synthase (tnrC) 32.9% 
Glutamate tamity 

HP0380 glutamate dehydrogenase (gdhA) 59 0% 

HP0512 glutamine synthetase (glnA) 4^6% 

HPH58 pyrroline-Scarboxytate reductase (proC) 28.9% 
Pyruvate (amity 

HP0941 alanine racemase. biosynihetic (air) 324% 
HP1468 branched^in-amino-add 

aminotransferase (ilvE) 63 5% 

HP0330 ketol-acid reductoisomerase (ilvC) 43 j% 
Serine famity 

HP0107 cysteine synthetase (cysK) 45 7% 

HP0096 phosphogfycerate dehydrogenase 3) b% 

HP0397 phosphogfycerate dehydrogenase (serA) 32.5% 

HP0736 phosphoserine aminotransferase (serC) 30.7% 

HP0652 phosphoserine phosphatase (serB) 365% 

HPI210 serine acetyttransferase (cysE) 935% 

HP0t83 serine hydroxymethyttransferase (glyA) 54.0% 

BIOSYNTHESIS OF COFACTORS. PROSTHETIC GROUPS 
AND CARRIERS ' 
General 

HP0220 synthesis of [Fe-S] cluster (nifS) 48.0% 
Biotin 

HP0598 8-amino-7-oxononanoate synthase (bioF) 34.9% 
HP0976 adenosytrnerhionir^arruno-7^xc<K)nanoate 

aminotransferase (bioA) 49.2% 
HP1140 biotin operon repressor/biotin acetyl coenzyme 

A carboxylase synthetase (birA) 35.9% 
HP0407 biotin sulfoxide reductase (bisC) 42.7% 
HP 1254 biotin synthesis protein (bioC) 32.1% 
HP1406 biotin synthetase (btoB) 352% 
HP0029 dethiobiotin synthetase (bfoDJ 35.0% 
FoSc acid 

HP1036 7. 8-dihydro^hydroxymethytpterln- 

pyrophosphokinase (folK) 34.6% 
HP0587 aminodeoxychorismate (yase (pabC) 32.4% 
HP1232 dihydropteroate synthase (folP) 34 5% 

HPI545 folylporygtutamate synthase (folCJ 352% 
HP0928 GTP cyclohydrotase I (folE) 50.9% 
HP0577 methytenwetrahydrofolate dehydrogenase 

(folDJ 434^ 
HP0293 para-aminobenzoate synthetase (pabB) 35.1% 
Haem and porphyrin 

HP0I63 defta-aminotevulinic acid dehydratase 

(hemB) 505^ 
HP0376 ferrochelatase (hemH) 33 4^ 

HP0306 glutamate- l-semialdehyde 2,1-aminomutase 

(hemL) 5i3qfa 
HP0239 glutamyHRNA reductase fhemA) 32.7% 
HP0665 oxygerwndependent coproporphyrinogen III 

oxidase (hemN) A 2.4% 
HP1226 oxygen-independent coproporphyrinogen 111 

oxidase (hemN) 37.9% 
HP0237 porphobilinogen deaminase (hemC) 45.7% 
u22^ protoporphyrinogen oxidase (hemK) 35 '9% 
ud?^ 4 uroporphyrinogen decarboxylase {hemE) 46.3% 
HP 1224 uroporphyrinogen HI cosynthase (hemD) 276% 
Menaquinone and ubiquinone 
HP1360 4-hydroxybenzoate octaprenyttransferase 

(ubiA) 26 6% 

HP0929 geranyttranstransferase (ispA) 39 s% 

HP0240 octaprenyHJiphosphate synthase (IspB) 316% 
Morybdopterin 

HP0768 molybdenum cofactor biosynthesis 

protein A (moaA) 3^4% 
HP0798 molybdenum cofactor biosynthesis protein C 

(moaC) 979% 
HP0^2 mofybdopterin biosynthesis protein (moeA) 36.3% 

morybdopterin biosynthesis protein (moeB) 322% 
HP0799 motybdopterin biosynthesis protein fmog) 60.8% 
HrTjeoi morybdopterin converting factor, subunit t 

(moeD) 3t1% 
HP0800 molybdopterin converting factor, subunit 2 

(moaE) 3ll% 
HP0769 morybdopterirvguanine dinucleotide biosynthesis 

protein A (mobAJ 28.3% 
Pantothenate 

HP 1068 3-methyF2«jxobutanoate hydraxyniethyttransferase 
(panB) 4^ 7^ 

aspartate 1 -decarboxylase (panD) 50 0% 

HP0006 [wntoate-beta-alanine Ogase (panC) 442% 



Pynostine 

HP:5£3 pyridoxal phosphate biosynihetic protein 

A (pdxA) 342% 

H?:£82 oyf'doxai phosphate biosynihetic protein J 

(PdxJ) v&h 

H ^C2 GTP cvdohydrofase II (ribA) 472% 
HP060* GTP cyclohydrotase H/3.4^ir^roxy-24)utanone 

4-ohosphate synthase (ribA, rioB) 44.0% 

HP1505 riboflavin biosynthesis protein (ribG) 331% 
HP 1087 riboflavin biosynthesis regulatory protein 

(nbC) 28 9% 

HPI57* riboflavin synthase alpha subunit (ribC) 328% 

HP00C2 nboflavin synthase beta chain (rib£) 52.4% 
Thxxeooxin. glutaredoxin and glutathione 

2d!1« Oa^^i^nTyttrarcpeptjdase <ggt} 612% 
HPK5S thioredoxin 

HP0E24 thioredoxin (trxAJ 6l5qb 

HPitfri thioredoxin reductase (trxB) 28.5% 

HP08W thiamin biosynthesis protein (thiF) 34.6% 
HPC6i3 thiamin phosphate pyraprKJsphorytase/ 

hyroxyethytthia20le kinase (thiB) 35,7% 
HP08*5 thiamin phosphate pyroprwspnorylase/ 

hyrcjxyethytthiazole kinase (thiM) 379% 
HPC&« thiamine biosynthesis protein (thi) 410% 
Pyncfne nucleotides 

HP0329 NH(3rdependent NAD* synthetase (nadE) 375% 
HP 13S5 ncotinate-nucleoMe pyrorjhosphorytese 

(nadC) 35 
HP13S6 quinofinate synthetase A (nadA) 342% 

CELL ENVELOPE 

Memcranes, lipoproteins and porins 

HPU50 60 kDa inner-membrane protein 400% 

HP0T80 apolipoprotein N-acyttransferase (cute) 28.0% 

MP0175 cell binding factor 2 34 9% 

HP0G78 Hypothetical protein 29.4% 

HP0567 membrane protein 26 4% 
HPUS6 membrane-associated lipoprotein (Ipp20) 88.9% 

HPl5t>. outer membrane protein 39.9% 

HP00C9 outer membrane protein (ompl) 00% 

^a^ 24 o^er membrane protein (omplO) 00% 

HP0472 outer membrane protein (ompll) 99.5% 

outef membrane protein (ompl2) 00% 

HPC638 omer membrane protein (ompl3) 00% 

HPOoTi outer membrane protein (omp») 360% 

HPO^to outer membrane protein (omplS) 33,5% 

HP0722 outer membrane protein (ompl5) 43,3% 

HP0 If S outer membrane protein {ompl7) 433% 

f^Ppif 6 outer membrane protein (omplS) 0.0% 

HPC8S6 outer membrane protein (ompl9) . 36.6% 

HPQC25 outer membrane protein (omp2) 0.0% 

HP0912 outer membrane protein (omp20) 0.0% 

HPCSt3 outer membrane protein (omp21) 38.2% 

HP0S23 outer membrane protein (omp22) 0.0% 

HPlto? outer membrane protein (omp23) 0.0% 

HP1113 outer membrane protein (omp24) 35.0% 

HPti56 cuter membrane protein (omp25) 00% 

HP1157 outer membrane protein (omp26) 23.0% 

HPH77 outer membrane protein (omp27) 37Xfh 

HP1243 outer membrane protein (omp28) 0.0% 

HP1342 outer membrane protein (omp29) o!o% 

HP0079 outer membrane protein (omp3) 0.0% 

HP139S outer membrane protein (omp30) o!o% 

HPK^ outer membrane protein (omp3li 0.0% 

HP1501 outer membrane protein (omp32) 0J3% 

HP0127 outer membrane protein (omp4) rj.0% 

HP0227 outer membrane protein (omp6) 365% 

HP0229 outer membrane protein (omp6) 38.4% 

HP0252 outer membrane protein (omp7) 30 6% 

HP02&4 outer membrane protein (omp8) 375% 

HP0317 outer membrane protein (omp9) 36.3% 

H d^^ outer membran a protein PI (ompPl) 23.3% 
prolipoprotetn diacytgryoeryl transferase (lgt)34.4% 

HPoeso protective surface antigen 015 275% 

HP1S71 rare lipoprotein A (rtpA) 376% 

HP06iO toxirHike outer membrane protein 2a3% 

H Pp922 toxin-fike outer membrane protein 295% 

HP02S3 toxirHike outer membrane protein 30.6% 
Mur&n saccvhjs and peptidogrycan 
HP083O amkJase 

HP0733 0-alanine:D-alanine Sgase A (dd!A) 2a!5% 

HP0549 glutamate racemase (glr) 366% 
"??JI? ^acetylmuramoyl-L-alanine amidase [amiA)26.8% 

HP0537 penicillin-binding protein 1A (PBP-1A) 337% 

HPISkj penicillin-binding protein 2 (pbp2) 350% 
HPH25 peptidogtycan associated apoprotein precursor 

(omplS) 42.6% 
HP0433 phosphc^-a<^murarncyl-penta peptide- 
transferase (mraY) 452% 
HP0743 rod shape-determining protein (mreB) 37.7% 
HP 1373 red shape-determining protein (mreB) 619% 
uoH 7 ? sha Pe-d , etermining protein (mreC) 33.6% 
HP06^3 soluble lytic murein transgrycosylase (sit) 322% 
HP1543 toxR-activated gerie (tagE) 372% 
HPI5« ttxR-ectivated gene (tagE) 312% 
HP1I55 transferase, peptidoglycan synthesis 
(murG) 

HP0740 UDP-MurNac-pentapeptide presynthetase 

(murF) 25 ^ 

HP14S* UDP-MurNac-tripeptide synthetase (murE) 360% 
HP1418 UOP-N-awtytenolr^nj^ucosamU 

reduaase (murB) 32.7% 
HP0648 UOP-N-acetylglucosarnine encJpvruvyl 

transferase (murZ) 46.7% 
HP0623 UDP-Nn»cetylmurarnate-atanine figase 

(murC) 373% 
HP0494 UDP-N-acetylmuramoyla ta nine*0-gtutamate 

ligase (murO) 31i1<fc 
Surface polysaccharides, lipopotysaccharides and antigens 
HP0003 3K!ecx/-o^rrarir>o-octulosonic ectd B-phosphate 

synthetase (kdsA) 53.4% 
HF0957 3^eox>Mj-rnamo<)Ctulosontc-acid transferase 

frtfA) 359% 

HP0853 ADP-heptose synthase (rfaE) 40.6% 
HPtl9l AOP-hepto3e-lp3 heptosyttransferase U 

( rf aF) 332% 
HP08S9 AOP<^lycero-0^rvKyiep<ose^^rrBrase 

(rfaD) 32.7% 



h^S l?^f O-acetytation protein (.igi) 41.8% 
HP0326 CMP-N-acetyineurarnWc acid synmetase 

(neuA) ^. „ 

HP0230 Cn>CMP-3^eo*y-amanrKK)C^jtosona I e- 

cytidytyl-transferase fkdsB) « 
^ectin/f.bnrooervtoinding protein 257% 
HP0379 fucosyttransferase 

HP0651 fucosyttransferase gjj 
GDP^Vmannose dehydratase (rfbD) 621% 
HP0B67 frptd A disaccharioe synthetase ftoxBi 32 n* 
HP0t59 lipopotysacchar^ ll^yttriSferase 

(rfaJ) jg gq. 

HP02O8 Bpopolysaccharide ^fucosyttransferase 

(rfaJ) jg jq. 

HP0805 Bpooligosaccharide 5G8 epitope biosynthesis-' 

associated protein (Vw2B) 369 q b 
HP0S26 Bpooligosaccharide 5G8 epitope biosynthesis- ' 

associated protein pex2B) 392% 
HP M16 Bpopolysaccharide I2focosyttransferase 

(rtaJ) 292% 
HPOS79 Bpopolysaccharide biosynthesis protein 

(wbpB) 42^, 
-HP1475- Jipopotysaccharide cons biosynthesis protein 

(kdtB) 4g 
HP0279 Bpopolysaccharide heptosyftransferase-1 

(riaC) 3l7% 
HP0619 Bpopolysacharide biosynthesis otycosyl 

transferase (lic2B) 37^% 
HP lice LPS biosynthesis protein 28.7% 
HP1578 LPS biosynthesis protein 23 1% 

HFM581 methicaiin resistance protein (llm) 292% 
HP0857 prospnoheptose isomerase (gmhA) 44.5% 
HP1275 phosphomannomutase (algC) 

(Pseudomonas aeruginosa) 396% 
HP1429 potysialic acid capsuJe expression protein ' 

(kpsF) 480% 
HP0366 spore coat potysecchande biosynthesis 

protein C 35 3^ 

HP0178 spore coat polysaccnaride biosynthesis 

protein E 362% 
HP0421 type 1 capsular polysaccharide biosynthesis 

protein J (capj) « 
HP0196 UOP^^roxymyristOyl) glucosamine ' 

N-acyttransferase (IpxD) 39 50, 

HP1052 UOP-3-C-acyl N-acetylglcosamir>e deacetylase 

(envA) 44 go, 

HP1375 UDP^-acetytfltucosarmne acyttransferase 

(IpxA) 41.8% 
Surface structures 

HP0840 BaAl protein gQ2% 
2£Sff flaae||ar basaFbody L-ring protein (flgHJ 327% 
HP0351 flagellar basaFbody M^ng protein (fUF) 34 4% 
HP0246 flagellar basaHxxly P^*vg protein (flgl) 379% 
HP1557 flageftar basal-body protein (fliE) 370% 
HP1559 flagellar basat-body rod protein (flgB) 

(proximal rod protein) 3^0% 
HPI558 flagellar basal-body rod protein (flgC) 

(proximal rod protein) AGCfh 
HP1092 flagellar basal-body rod protein (flgG) 355% 
HP1585 flagellar basal-body rod protein (flgG) 477% 
HP1041 flagellar biosynthesis protein (flhA) 431% 
HP1035 flagellar biosynthesis protein (flhF) 355% 
HP0684 flagellar biosynthesis protein (flip) 43 4% 

HP0770 flagellar biosyntheiic protein (flhB) 33 7% 

HP0685 flagellar biosyntnetic protein (fliP) 65 6% 

urLi?' 9 fla 9 ellar biosynthetic protein (fliQ) 52^3% 
HP0173 flagellar biosynihetic protein (fliR) 4% 
E J»8e|;ar export protein («H) 29.1% 
u££™ ? 3 9 eUar ex P° rt Protein ATP synthase (flil) 476% 
HPO870 flagellar hook (flgE) oaqo, 
HPO908 flagellar hook (ftgE) ^ 
HP1119 flagellar hook-associated protein 1 

(HAPl) (flgK) 27.6% 
HP0752 flagellar hook-associated protein 2 (fliD) 28 9% 
HP0815 flagellar motor rotation protein (motA) 329% 
HP0816 flagellar motor rotation protein (motB) 29 7% 
S?S 522" motor swteh ***** (fliG) 370% 
HP1031 flagellar motor switch protein (fliW) 34 4% 

HP0753 flagellar protein (fliS) 3V3J 
flafiellar protein G (flaG) 23^3% 
HP0797 flagellar sheath adhesin hpaA 98^% 
HP0584 flagellar switch protein (fliN) 39 7% 

HP0601 flagellin A (flaA) S 
HP0115 flagenin B (flaB) 

HP0295 flagellin B homologue (fla) 32 9% 

HP1575 flhB protein (flhB) 

HP1030 fliY protein (fliY) jgJJ 

HPC907 Hook assembly proteia Aagefla (flgO) 25.5% 

HP1274 paralysed flageila protein (pflA) 23^% 

HP0751 polar flagellin (flaG) 

HP04IO putative neuramir^aaose-binding 

ljo..^ naemaggtutinin hwnotogue (hpaA) 242% 

t£ IE secreted Pro 181 " involved in flagellar motaity72.5% 

HPM62 secreted protein involved in flagellar motinty962% 

HP0232 secreted protein invofved in flagellar motillty992% 

CELLULAR PROCESSES 
General 

HP00I9 chemotaxis protein (cheV) 265% 

HP0393 chemotaxis protein (cheV) 3] 7% 

HP0616 chemotaxis protein (cheV) 279% 

HP 1067 chemotaxis protein (cheY) 992% 

HP0517 GTP-binding protein (era) 95 6% 

HP 1490 haemolysin 39 

HP 1086 Haemorysin (tty) 402% 
HP0599 haemorysin secretion protein precursor 

(hylBJ 45 4% 

HP0392 histidine kinase (cheA) 4^4% 
HP0099 methyt-accepting chemotaxis protein (flpA) 325% 
™ methyl-accepting chemotaxis protein (dpB) 30.7% 
HP0082 methyl-accepting chemotaxis transducer 

(tipO .232% 

HP0391 purine-binding chemotaxis protein (cheW) 34 j% 
CeS division 

HP0331 cell division inhibitor (rrunO) 502% 

d^^wn membrane protein (ftsX) 25.7% 

HP0978 ceu division protein (ftsA) protein 3X9% 

HP0748 cell division protein (ftsE) 37.6% 

HP0286 cell division protein (ftsH) 4L2% 

HP1069 cell division protein (ftsH) 90 « 

HPt556 cell division protein (ftsf) 30'^ 

HP1090 cell division protein (ftsK) 39JJ 

«" «™ion protein (ftsW) Escherichia coC 32.7% 
HP0763 ceS division protein (ftsY) 
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HP0332 ceB division lopotogical specificity factor 

HP0979 cell divison protein (ftsZ) 433% 
HPH59 cen filamentation protein (fie) 632% 
Ceff kitting 

HP0887 vacuolating cytotoxin 94.7% 
Chaperones 

HPOOiO chaperone and heat shock protein (groEL) 99.6% 
chaperone and heai shock protein TO 
(dnalQ 

chaperone and heat shock protein C82.5 
(htpG) 

cc-chaperone (groES) 
co-chape rone and heat-shock protein 
(dnaJ) 

co-chaperone and heat-shock protein 
(MPE) 



63.4% 



465% 
992% 



42.7% 



allcyt hydroperoxide reductase (tsaA) 
catalase 



HP0109 
HP0210 



HP0011 
HP1332 



HP01I0 

33j0% 

HP 1024 ccH^aperone-curved DNA-binding protein A 

(CbpA) 37.7% 
Crvomosomesssoaeted protein 

HP1138 plasmid reDltatwn^ailriion related protein 40.4% 
Detoxification 
HP 1563 
HP0875 

HP0267 rjhtorohydroiase 
HP0243 neutrophil activating protein (napA) 

(baaerioferritin) 
HP0389 superoxide dismutase (sod8) 
HP 1452 thiophene and furan oxidizer (tdhF) 
Protein and peptide secretion 
HP0355 GTP-binding membrane protein (tepA) 
HP0074 apoprotein signal peptidase (tepA) 
HP0786 preprotein transtocase subunit [secAJ 
HP 1300 preprotein transtocase subunit (secY) 
HP1255 protein translocation proteax low temperature 

(secG) 3fj 
HP 1550 protein-export membrane protein (secO) 
HP 1549 protein-export membrane protein (secf) 
HP0576 signal peptidase I (lepB) 
HPH52 signal recognition particle protein (flh) 
HP0795 trigger factor (tig) 
Transformation 

HPO520 cag pathogenicity island protein (cagl) 
cag pathogeniotry island protein (cag 10) 
cag pathogenicity island protein (cagtl) 
cag pathogenicity island protein (cag 12) 
cag pathogenicity island protein (cag 13) 
cag pathogenicity island protein (cag 14) 
cag pathogenicity island protein (cagt5) 
cag pathogenicity island protein (cag 16) 
cag pathogeniciry island protein (cag 17) 
cag pathogeniciry island protein (cag 18} 
cag pathogenicity island protein (cag 19) 
cag pathogenicity island protein (cag2) 
cag pathogenicity island protein (cag20) 
cag pathogenicity island protein (cag21) 
cag pathogenicity island protein (cag22J 
cag pathogenicity island protein (cag23) 
cag pathogeniciry island protein (cag24) 
cag pathogenicity island protein (cag25) 
cag pathogenicity island protein (cag26) 
cag pathogenicity island protein (cag3) 
cag pathogeniciry island protein (cag4) 
cag pathogenicity island protein (cag5) 
cag pathogenicity island protein (cag6) 
cag pathogenicity island protein (cag7) 
cag pathogeniciry island protein (cagS) 
cag pathogenicity island protein (ceg9) 
competence lipoprotein (comL) 
competence locus E (com£3) 
conjugal transfer protein (traG) 



HP0530 
HP0531 
HP0532 
HP0534 
HP0535 
HP0536 
HP0537 
HP0538 
HP0539 
HP0540 
HP0521 
HP0541 
HP0542 
HP0543 
HP0544 
HP0545 
HP0546 
HP0547 
HP0522 
HP0523 
HP0524 
HP0526 
HP0627 
HP0528 
HP0529 
HP 1378 
HP 1361 
HP1006 
HP1421 
HP0333 
HP0042 
HP0525 
HP0441 
HP0017 
HP0459 



985% 
99.4% 
42.6% 

96 3% 
98.6% 
37.6% ' 

573% 
870% 
54_0% 
412% 



351% 
403% 
414% 



98.4% 
972% 
98.9% 

983% 

' 976% 
96.4% 
983% 
953% 
98,7% 



978% 
979% 
95.5% 
99.0% 
985% 
95.7% 
923% 
98.1% 
95.7% 
99.1% 
975% 
94.6% 



255% 
26.7% 
273% 



conjugative transfer reguton protein (trbB) 30.7% 



DNA processing chain A (dprA) 
trbl protein 
virBIt homotogue 
VirB4 homotogue 
virB4 homotogue (virB4) 
virB4 homotogue (virB4) 



323% 
314% 
1003% 
235% 
252% 
253% 



CENTRAL INTERMEDIARY METABOLISM 
Genera/ 

HP 1014 7-a-hydroxysteroid dehydrogenase (hdhAJ 332% 
HP1186 carbonic anhydrase 370% 
HP0CO4 carbonic anhydrase (cfA) 333% 
HP0669 hydrogenase expression /lorrnation protein 

(nypA) 28.1% 
HP0900 hydrogenase expression/formation protein 

(hypBJ 414% 
hydrogenase expression/formation protein 
(hypC) 385% 
hydrogenase expression/formation protewt 
(nypD) 478% 
hydrogenase expression/formation protein 
(nypE) 39.7% 
S-adenosylmethionine synthetase 2 (metX) 62.1% 
Amino sugars 

HP1532 glucosamine fructoses-phosphate 

aminotransferase (isomerizing) (glmS) 
Phosphorus compounds 
HPO620 inorganic pyrophosphatase (ppa) 
HP0696 N-methylhydantoinase 
HP1010 pofyphosphate kinase (ppk) 
Polyemine biosynthesis 
HP0422 arginine decarboxylase (speA) 
HP0020 carboxynorspermidine decarboxylase 

(nspC) 455% 
spermidine synthase (speE) 265% 



HP0047 
HP0197 



417% 



603% 
263% 
38.5% 



333% 



HP0632 
Other 
HP0070 
HP0C69 
HP0068 
HP0067 
HP0071 
HP0073 

HP0072 



urease accessory protein (ureE) 971% 

urease accessory protein (ureF) 945% 

urease accessory protein (ureG) 95.0% 

urease accessory protein (ureH) 962% 

urease accessory protein (urel) 685% 
urease alpha subunit (ureA) 

(urea amktohydrotase) 100.0% 
urease beta subunit (urea amidohydrotase) 

(ureB) t0O0% 

urease protein (ureC) 9&0% 



ENERGY METABOLISM 
Aerobic 
HPX222 
HP0961 



HP0037 
HP 1269 



0-tactate dehydrogenase (did) 270% 
tfycerol<H)hosphate dehydrogenase 
(NAD(PH 363% 
NAOH-ubiquinone oxidoreductase subunit 19.*% 
NAOH-ubiquinone oxidoreductase, NQOlO 



HP1272 



HP 1273 
HP1266 



HP 1263 
HP1262 



HP 1261 
HP1260 



HP 1267 
HP 1268 



HP0294 
HP1238 
HP 1399 
HP0943 
HP0056 



subunit (NOOlO) -10% 
NADH^Aiquinone oxidoreductase. NOOIl 
subunit (istO0 1 1 ) ((Paracoccus denitrificans} 42.6% 
NADH-ubtquinone oxidoreductase. N0012 
subunit (N0012) 432% 
I^H-ubiq«none oxidoreductase. NQ013 

402% 



subunit (NQ013) 

rWOH-ubiquinone oxidoreductase. 
N0014 subunit (N0O14) 
NAW-ubiquinone cocidoreductase. NQ03 
Subunit (NQ03) 

NAT>NAiqiiinc« ojodoreductase. 
NQ04 subunit (N004)fTrlticum aestivum) 44.6% 
I^H-ubiquinone oxkxxeductase, NCOS 
subunit (NQ05) 

NADH^bio^none oxidoreductase. NQ06 
subunit (N006) 

I^H-ubiquinone oxidoreductase. N0O7 
Subunit (NQ07) 

NADH^iquinone oxidoreductase. NQ08 
Subunit (NQ08) 

NADH-ubiquinone oxidorBductase, NQ09 
Subunit (NQ09) 
Amino adds and amines 
HP 1398 alanine dehydrogenase (aid) 
aliphatic amylase (aimE) 
aliphatic amidase (aimE) 
arginase (rocF) 

D-amino acid dehydrogenase (dadA) 
delta-l-pyrrofine-S-cartxwylate dehydrogenase 
(Syriechocystis sp.) 322% 
L-asparaginase D (ansB) 64.1% 
L-serine deaminase (sdaA) 45.8% 



312% 
316% 



-10% 

622% 

40.7% 

42.4% 

412%* 

39.6% 
75.4% 
372% 
318% 
262% 



HP0723 
HP0132 
Anaerobic 
HP0666 



HPO590 
HP0591 
HP0193 



69.4% 
70.8% 



41.0% 
43.7% 



anaerobic gtyoefol-3-phosphate dehydrogenase. 
Subunit C (glpC) 272% 
ferredoxin oxidoreductase, alpha subunit 42.7% 
ferredoxin oxidoreductase. beta subunit 432% 
ferredoxin oxidoreductase, gamma subunit 333% 
fumarate reductase, cytochrome b subunit 
_ CWC) 583% 
HP0192 fumarate reductase, flavoprotein subunit 
(frdA) 

HP0191 fumarate reductase, irorwutfur subunit 
(frdS) 

HPlllO pyruvate ferredoxin oxidoreductase. alpha 
subunit 

HP1111 pyruvate ferredoxin cocidoreductase. beta 
subunit 

HPH09 pyruvate ferredoxin oxidoreductase. delta 

subunit 470% 
HP 1103 pyruvate ferredoxin oxidoreductase. gamma 

subunit 372% 
ATP-orotonmotive force interconversion 
HP0828 ATP synthase FO. subunit a (atpB) 37.7% 
HP 1136 ATP synthase FO. subunit b (atpF) 283% 
HP1137 ATP synthase FO, Subunit bO (atpFO} 325% 
HP 1212 ATP synthase FO. subunit c (atpE) 412% 
HP 1134 ATP synthase FL subunit alpha (atpA) 62.7% 
HP U32 ATP synthase Fl subunit beta (atpD) 853% 
HP1135 ATP synthase Fl subunit delta (atpH) 24.6% 
HP1131 ATP synthase Fl subunit epsiton (atpC) 32.7% 
HP1133 ATP synthase Ft subunit gamma (atpG) 373% 
Electron transport 

HP0146 cbb3-type cytochrome c oxidase subunit Q 

(CcoO) 442% 
HP0265 cytochrome c biogenesis protein (ccdAJ 35.4% 
HP0378 cytochrome c biogenesis protein (ycfS) 375% 
HP0H7 cy:cchrorne c oxidase, diheme subunit 

membrane-bound (fixP) 333% 
HP0144 cytochrome c oxidase, heme b and copper- 
binding subunit, membrane-bound (fixN) 43.9% 
HP0145 cytochrome c oxidase, monoheme subunit. 

membrane-bound (fixO) 45.7% 
HP 1461 cytochrome c551 peroxidase 48 5% 

HP1227 cytochrome c553 38.4% 
HP0277 ferredoxin 625% 
HP0588 ferrodoxin-tike protein 42.6% 
HP1508 ferrodoxin-iike protein 294% 
HP 1161 Bavodoxin (fldA) 47.0% 
HP0642 NAD(P)H^avin oxidoreductase 46.1% 
HP0954 cixygen-insensitive NAD(P)H nitroreductase 32.7% 
HP0634 ciu-none-reactive Ni/Fe hydrogenase (hydO)54.7% 
HP0633 cn^none-reactive Ni/Fe hydrogenase, cytochrome 

b subunit (hydC) 614% 
HP0632 quinone^eactive Ni/Fe hydrogenase, large 

subunit (hydB) 68.5% 
HP0631 quinone-reactive Ni/Fe hydrogenase. small 

Subunit (hydAJ 683% 
HP1539 ufciquinoJ cytochrome c oxidoreductase. 

c/.ochrome b subunit (fbcH) 393% 
HP 1538 ubiquinol cytochrome c cuddoreductase. 

cytochrome cl subunit (fbcH) 28.8% 
HP1540 ufckjuinol cytochrome c oxidoreductase, 

aeske 2Fe-2S subunit (fbcF) 392% 
Entner-Doudoroff 

HP 1099 2-te:o-3^eoxy^-prK)sphoghjcoriate aldolase 

60.3% 

HP1100 6-phosphogluconate dehydratase 60.7% 
Fermentation 

HP0691 3-cxoadipa:e coA-transferase subunit A 

GrtD) 655% 

HP0692 3-oxoadipate coA-transferase subuntt B 

t/\E» 732% 
HP0903 acetate kinase (ackA) (Escherichia coli) 423% 
HP0904 phosphate acetyltransferase (pta) 610% 
HP0905 phcsphotrarisacetylase (pta) 26.9% 
HP0357 short-chain alcohol dehydrogenase 673% 
Gluconeogenesls 

HP1385 fructose-16-bisphosphatase 36.4% 
r^.osphoenolpyruvaie synthase (ppsA) 62.4% 
phcsphogfycerate kinase 473% 



HP0121 
HP 1345 
Glycolysis 
HP0154 
HP0176 
HP 1103 
HP 1166 
HP0921 



encJase (eno) 553% 

fructose-bisphosphate aldolase (tsr) 463% 

gtucokinase (gik) 415% 

g!ucose*pnosphate isomerase (pgl) 633% 
g^aWehyde-3-ohosphate dehyclrogenase • 

(Sap) 465% 
gtyceraldehyde-3-phosphate o^hvclrogenase 

(fiap) 46.7% 

pfxjsphrjgrycerate mutase (pgm) 44.6% 

triosephosphaie teomerasa (tpi) 345% 
Pentose phosphate pathway 

HP 1386 Or^lose^rxjsphate 3 epimerase (rpe) 442% 
HP1102 glucc^e^phosphaie 1 -dehydrogenase 



HP 1346 



HP0974 
HP0194 



HP11OI 

HP1495 

HP 1088 

HP03S4 

Sugars 

HP0574 

HPO360 

TCA cycle 

HP0779 

HP0026 

HP1325 

HP0509 

HP0027 



(devB) 

gtucoso-Sphosphate dehydrogenase 
(g6p0) 

transaidotase (tal) 
transketolase A (tktA) 
transketotase B (tktS) 

galactosidase acetyltransferase (lacA) 
UDP-glucose 4-epimerase 

econitase B (acnB) 

citrate synthase (gftA) 

fumarase (fumC) 

glycoiate oxidase subunit (gteO) 

•socitrate dehydrogenase (icd) 



29.2% 

36.7% 
335% 
46.7% 
39.7% 

41.0% 
431% 

64.0% 
47.8% 
63.7% 
980% 
70.7% 



FATTY ACID AND PHOSPHOUPlD METABOLISM 

General 

(3RHiydrcwyrnyristoyHacyl carrier protein) 
dehydratase (fabZ) 474^ 
1 -acyt^tycefo(-3-pho3phate acyftransf erase 
(pIsC) (Escherichia coli) 32 0^ 

3-ketoacyt-acyl carrier protein reductase 
(fabG) 457% 
acetyl coenzyme A acetyltransferase 
(thiolase) (fadA) szcfh 
ecetyl-CoA carboxylase beta subunit 
(accD) 49 4qb 

acetyMDoA synthetase (acoE) 52.3% 
acetyl-coenzyme A carboxylase (accA) 503% 
acyl carrier protein (acpP) 55 3% 

acyt carrier protein (acpP) 55 3% 

beta ketoacyt-acyl carrier protein synthase II 
< fabF ) 50.0% 
beta-ketoacyt-acyl carrier protein synthase III 
(fabH) 444^ 

biotin carboxyf carrier protein (fabE) 30.8% 

btotin carboxylase (accC) 521% 

COP-digfyceride hydrolase (cdh) 733% 

CDP-diglyceride synthetase (cdsA) 42.4% 

cyclopropane fatty acid synthase (cfa) 39 7% 

diacyfglycerol kinase (dgkA) 45 8% 
enoyl-facyl-carrier-pfotein) reductase (NADH1 

( fat ») 458% 
fatty actd/phospholtpid synthesis protein 

(P«sX) 376% 

Hoto-ecp synthase (acpS) 29.1% 
malonyt coenzyme A-acyl carrier protein 

transacytase (fabD) 35.4% 
phosphaticylgrycerophosphate synthase 

(PflsA) 35.4% 
phosphatidylserine decarboxylase proenzyme 

(P*J) 332% 

phosphatidylserine synthase (pssA) 99.6% 
phospholipase A1 precursor 

(DR-phospholipase A) 33.3% 

PURINES. PYR1M1CHNES. NUCLEOSIDES AND NUCLEOTIDES 
General 

HP0757 beta-alanine synthetase homotogue 40.0% 
20-Deoxyribonucleotide metabolism 



HP 1376 

HP 1348 

HP0561 

HP0690 

HP0950 

HP1045 
HP0557 
HP0559 
HP0962 
HP0558 

HPO202 

HP0371 
HPO370 
HP0871 
HP0215 
HP0416 
HP0700 
HP0195 

HP0201 

HP0608 
HP0090 

HP 1016 

HP1357 

HP 1071 
HP0499 



HP0372 



HP0680 



deoxycytidine triphosphate deaminase 
(dcd) 282% 
deoxyuridine SO^triphosphate nucleotidohydrotase 

(dut) 4,4^ 

ribonucleoside diphosphate reductase, beta 
subunit (nrdB) 39.0% 
ribonudeoside-diphosphate reductase i alpha 

28.4% 
45.9% 



(nrdA) 

HP0825 thioredoxin reductase (trxB) 
Purine ribonucleotide biosynthesis 
HP0321 50guanytate kinase (gmk) 
adenylate kinase (adk) 
adenylosuccinate lyase (purB) 
adenylosuccinate synthetase (purA) 
formyftetrahydrofolate hydrolase (purU] 
gfycinamide ribonucleotide syntfietase 
(PurD) 

GMP reductase (guaC) 
GMP synthase (guaA) 
inosine50-nxxx>phosphate dehydrogenase 



HP0618 
HP1112 
HP0255 
HP 1434 
HP12I8 

HP0854 
HP0409 
HP0829 



33.3% 
49.5% 



31.8% 
313% 
56.1% 

585% 
67.7% 



HP0198 nucleoside diphosphate kinase (ndk) 
HP0742 r^sprxjribosyfpyrophosphate synthetase 

(praA) ges^ 
HP1530 purine nucleoside pnosphorylase (punS) 20.7% 
Pyrimidine ribonucleotide biosynthesis 
HP 1084 aspartate transcarbamoyiase (pyrB) 38.7% 
HP0919 cartwrroyHrfrosphate synthase (gtutarnine- 

hydrolysing)(pyrAb) 48.6% 
HP 1237 carbamoyVphosphate synthetase (pyrAa) 39 7% 
HP0349 CTP synthetase (pyrG) 50 7% 

HP0266 dihydroorotase (pyrC) -io% 
HP0581 dihydroorotase (pyrC) 31.5% 
HPlOil dihydroorotate dehydrogenase (pyrD) 4^5% 
HP 1257 orotate phosphoribosyltransferase (pyrE) 35 5% 
HP0005 orotidine 60-phosphate decarboxylase (pyrF)39.0% 
HP1474 thymidylate kinase (tmk) 339% 
HP0777 uridine SO^rwnophosphate (UMP) kinase 

(pyrH) 50.4% 
Satvage of nucleosides end nucleotides 
HP0I04 63^cy^c-nucleot»de 20^)tophodtesterase 

(cpdB) 31.8% 
HPC572 adenine phosDhoribosynransferase (apt) 50 3% 
HP 1179 phosphopentomutase (deoB) 559% 
HP1178 purtne-fludeoside pnosphorylase (deoO) 55.6% 
HP0735 xanthine guanine rjhosphoribosyl transferase 

(9Pi) 27.1% 
Sugar-nucleotide biosynthesis and conversions 
HPQ043 mannose-frphosphate isomerase (pmi) or 

(atOA) 42.8% 
HPC045 nodulaiion protein (nolK) 44 3% 

HP0646 UDP-glucose pyrophosphorylase (galU) 65.6% 
HP0683 UDP-N-acetylgtucosamine rjyroorwsphorylase 

(gtmU) 40.0% 

REGULATORY FUNCTIONS 
General 

alternative transcription initiation factor. sigma-F 
<1iA) 34.6% 
carbon starvation protein (cstA) 698% 
carbon storage regulator (csrA) 433% 
ferric uptake regulation protein (fur) 39.9% 
guanosine penta phosphate phosphohydrotase 

, 26.4% 
perwiHn tolerance protein (lytB) 30,6% 



HP 1032 

HP 1168 
HP1442 
HP1027 
HP0278 

HP0400 



HP0775 penta-phospnaw guanosir»e-3Cs^phosrjho- 



hydrolase (spoT) 36 7* 
HP0524 peptide methionine sutphoxide reductase 

(msrA) 66.8% 

HP 1025 putative heat shock protein (hspR) 46.2% 

HP1572 regulator/ protein OniR 313% 

HP0703 response regulator 445% 

HP1C2I response regulator 28.7% 

HP1043 response regulator 268% 

HP1365 response regulator 32.*% 

HP0tG6 response regulator (ompR) 51.0% 

HP07I4 RNA potymerase sigma-54 factor (rpoN) 37.1% 

HP0068 RNA polymerase Stgma-70 factor (rpoO) 43.5% 

HP0792 sigma -54 interacting protein 97.7% 

: tPOtW signal-transducing protein, histidine kinase 27.1% 

HP 1364 signal-transducing protein, histidine kinase 24.9% 
HP0244 signal-transducing protein, histidine kinase 

(atoS) 30.0% 

HP0048 transcriptional regulator (hypF) 34.5%. 

HP1287 transcriptionaJ regulator (tenAJ 34.7% 

HP0727 transcriptional regulator, putative 33.3% 

REPLICATION 
Degradation of DNA 

HP0275 ATP-dependent nuclease (addB) 27.2% 

HP02S9 exonuclease vll, large subunit (xseA) 37.6% 
DNA replication, restriction, modification, recombination and repair 

HP0142 A/G-spedfic edenine gtycosylase (mutY) 385% 
HP0050 adenine specific DMA methyttransferase 

(dpnA) 37.4% 
HPO910 adenine specific DNA methyttransferase 

(HINOtlM) 33.4% 
HP1352 adenine specific DNA methyttransferase 

(HINRM) 62.5% 
HP0263 adenine speoTic DNA methyttransferase 

(hpaim) 33.9% 
HP0481 adenine specific DNA methyttransferase 

(MFOKI) 293% 
HPO260 adenine specific DNA methyttransferase 

(mod) . 33.9% 
HP0593 adenine specific DNA methyttransferase 

(mod) 38.5% 
HP 1522 adenine specific DNA methyttransferase 

(mod) 425% 
HP0473 adenine specific DNA methyttransferase 

(VSP1M) 4i1% 

HP0054 adenine /cytosine DNA methyttransferase 32.1% 

HP0790 ano-codon nuclease masking agent (prrB) 425% 
HP1529 chromosomal replication initiator protein 

(dnaA) 34.9% 
HP1121 cytosine specific DNA methyttransferase 

(BSPStM) ' 37.0% 
HP0051 cytosine specific DNA methyttransferase 

(DDEM) 39.0% 
HP0483 cytosine specific DNA methyttransferase 

(HPHIMC) 38.7% 

HP0701 DNA gyrase. sub A (gyrA) 97.4% 

HP0501 DNA gyrase. sub B (gyrB) 46.0% 

HP1478 DNA helicase II (uvrO) 35.3% 

HP0548 DNA helicase, putative 38.6% 

HP0615 DNA ligase (lig) 40.1% 

HPQ621 DNA mismatch repair protein (MutS) 32.6% 

HP1470 DNA polymerase I (polA) 40.0% 

HP1460 DNA potymerase Ml alpha-subunit (dnaE) 42.0% 

HPO500 DNA polymerase III beta-subunit (dnaN) 26.0% 
HP 1231 0NA polymerase IN delta prime subunit 

(holB) 48.6% 
HP 1387 DNA polymerase til epsilon subunit (dnaQ) 35.1 % 
HP0717 DNA polymerase 111 gamma and tau subunits 

(dnaX) 39.0% 

HP0012 DNA primase (dnaGJ 36.6% 

HP1523 DNA recombinase (recG) 32.7% 

HP1393 0NA repair protein (recNJ 28.3% 

4P0116 DNA topoisomerase I (topA) 45.1% 

HP0440 0NA topoisomerase I (topA) 317% 

HP0602 endonuctease III 36.6% 

HP0585 endonuclease III (nth) 40.1% 

HP0705 excinudease ABC subunH A (uvrA) 63.4% 

HP 1114 exonuclease ABC subunfl B (uvrB) 63.1% 

n?0821 excinuctease ABC subunit C (uvrC) 31.5% 

HP1526 exodeoxyribonuclease (texA) 58.9% 

HP0213 glucose inhibited division protein (gidA) 48.5% 

HP10S3 glucose-inhibited division protein (gidB) 32.9% 

HP1553 helicase 33.0% 

HP0683 Holliday junction DNA helicase (ruvA) 39.0% 

HP1059 HoMday junction DNA helicase (ruv8) 54.6% 
HP0877 Holtiday junction endodeoxyrtbonudease 

(ruvC) 34.7% 

HP0675 integrase/recombinase (xerC) 31.6% 

HP0995 integrase/recombinase (xerD) 27.8% 

HP0323 membrane bound endonuclease (nuc) 311% 
HP0676 methylatedOrJANproteiivcysteirie 

methyttransferase (datl ) 410% 
HP0387 primosomal protein replication factor (priA) 36-3% 

HP0153 recombinase (recA) 99.1% 
HP0925 recombinational DNA repair protein (recR) 36.5% 
HP0911 rep helicase, singte-stranded DNA-dependent 

ATPase (rep) 33.8% 

HP 1362 replicative DNA helicase (dnaB) 39.4% 

HP 1383 restriction modification system S subunit 38.1% 

HP0661 rfbonuclease H (rnhA) 68.4% 

HP 1323 ribonudease HII (mhB) 36.3% 

HP 1245 single-strand DNA-binding protein (ssb) 32.6% 
HP034S single-stranded'DNA-specifK; exonuclease 

(red) 33.6% 

HP1009 site-specific recombinase 213% 

HP 1541 transcription-repair coupling factor (trcF) 37.7% 



HP0462 type I restriction enzyme S protein (hsdS) 37.0% 

HP0463 type I restriction enzyme M protein (hsdM) 29.4% 

HP0464 type I restriction enzyme R protein (hsdR) 317% 

HP0846 type 1 restriction enzyme R protein (hsdR) 48.0% 

HP0848 type 1 restriction enzyme S protein (hsdS) 37.0% 

HP0850 type t restriction enzyme M protein (hsdM) 64.4% 

HP 1402 type I restriction enzyme R protein (hsdR) 26.6% 

HP1403 type I restriction enzyme M protein (hsdM) 37.1% 

HP 1404 type I restriction enzyme S protein (hsdS) 36.0% 

HP0092 type 0 restriction enzyme M protein (hsdM) 553% 

HP0091 type ft restriction enzyme R protein (hsdR) 60.7% 

HP 1369 type HI restriction enzyme M protein (mod) 45.6% 

HP1370 type HI restriction enzyme M protein (mod) 37.0% 

HP 1371 type HI restriction enzyme R protein 265% 

HP0592 type 111 restriction enzyme R protein (res) 30.6% 

HP 1521 type lit restriction enzyme R protein (res) 33.1% 

HP1472 type US restriction enzyme M protein (mod) 32.4% 

HP 1367 type US restriction enzyme Ml protein (mod) 

([Moraxefla bovfs) 693% 

HP1368 type US restriction enzyme M2 protein 

(mod) 33.0% 

HP 1517 type US restriction enzyme R and M protein 

(EC0571R) 26.7% 



HP W7i type ItS restriction enzyme R protein 

(BCG1B) 265% 
HP 1366 type IIS restriction enzyme R protein 

(MBOMR) 37.1% 
HP 1208 ulcer associated adenine specific DNA 

methyttransferase 93.4% 
HP 1209 ulcer-associated gene restriction endonuclease 

(iceA) 95.5% 

HP1347 uractl-DNA gtycosytase (ung) 43.1% 

TRANSCRIPTION 
Degradation of RNA 

HP1213 polynucleotide phosphoryfase (pnp) 38.9% 

DNA-oependent RNA polymerase 
HP 1293 DNA-directed RNA polymerase, alpha subunit 

(rpoA) 353% 
HP 1198 DNA-directed RNA polymerase, beta subunit 

(rpoB) - 47.8% 
Transcription factors 

HP0866 transcription elongation factor GreA (greA} 60l3% 
HP1514 transcription terrnination factor NusA 

(nusA) 39.1% 
HP0001 transcription termination factor NusB (nusB)305% 
HP 1203 transcription termination factor NusG 

(nusG) 410% 

HP0550 transcription termination factor Rho (rho) 56.6% 

WW processing 

HP064O poty(A) polymerase (papS) 37.4% 

HP0662 ribonuciease III (rnc) 373% . 

TRANSLATION 
Genera/ 

HP0944 translation initiation inhibitor, putative 45.6% 
Aminoacyi (RNA synthetases 

HP 1241 alanyf-tRNA synthetase (ataS) 44.9% 

HPQ319 erginyMRNA synthetase (argS) 35.8% 

HP0617 aspartyURNA synthetase (aspS) 60.1% 

HP0886 cysteinykRNA synthetase (cysS) 97.3% 

HP0476 gtutamyHRNA synthetase (gltX) 43.1% 

HP0643 gfutamyHRNA synthetase (gltX) 393% 
HP0960 glycyt-tRNA synthetase, alpha subunit 

(gtyQ) 60.1% 
HP0972 gjycyt-tRNA synthetase, beta subunit (gtyS) 33.6% 

HP 1190 histidyMRNA synthetase (hisS) 32.4% 

HP1422 isoleucyt-tRNA synthetase (iteS) 49.7% 

HP 1547 leucyl-tRNA synthetase (leuS) 45.9% 

HP0182 lysylHRNA synthetase (tysS) 58.6% 

HP0417 metruonyMRNA synthetase (metS) 42.4% 
HP0403 phenylalanyl-tRNA synthetase, alpha subunit 

(pheS) *8.7% 
HP0402 fjhenylalanyt-tRNA synthetase, beta subunit 

(phef) 30.0% 

HP0238 prolyMRNA synthetase (proS) 39.8% 

HP 1480 seryHRNA synthetase (serS) 48.3% 

HP0123 threonyt-tRNA synthetase (tnrS) 42.1% 

HP 1253 tryptophanykRNA synthetase (trpS) 62.6% 

HP0774 tyrosyMRNA synthetase (tyrS) 54.7% 

HP 1153 vaM-tRNA synthetase IvafS) 43.7% 
Degradation of proteins, peptides and gtycopeptides 

HP0570 amino peptidase a/i (pepA) 38.5% 

HP0033 ATP-dependent Ctp protease (clpA) 403% 
HP0794 ATP-dependent dp protease proteolytic 

component (dpP) 64.6% 

HP1379 ATP-dependent protease (ton) 433% 

HP0223 ATP-dependent protease (sms) 410% 
HP1374 ATP-dependent protease ATPase subunit 

(dpX) 663% 
HP0264 ATP-dependent protease binding subunit 

(dpS) - 97.7% 

HP0169 collagenase (prtC) 40.1% 

HP0516 heat-shock protein (hsIU) ORFl 98.4% 

HP0515 heat-shock protein (hsIV) 67.1% 

HP0470 otigoendopeptidase F (pepF) 87.9% 

HP0657 processing protease (ymxG) 245% 

HP1485 proline dipeptkJase (pepQ) 355% 

HP1350 protease 40.6% 

HP1012 protease (pqqE) 29.6% 

HP1435 protease IV (PspA) 417% 

HP0404 protein kinase C inhibitor (SP:P16436) 405% 

HP1019 serine protease (htrA) 523% 

HP1584 sialogfycoprotease (gcp) 35.7% 

HP0382 zinc-metartoprotease (YJR117W) 365% 



Nucleoprotelns 

HP0835 histone-like DNA-binding protein HU (hup) 44.6% 
Protein modification 

HP0363 L-isoaspartyl-protein carboxyl methyttransferase 



(pom) 43,0% 

HP 1299 methionine amino peptidase (map) 433% 

HP 1441 peptidyf-protyl ds-trans isomerase B. 

cydosrxirirHype rotamase (ppf) 58.1% 

HP 1123 pepticfvHxolyl ds-trans isomerase. FKBP-type 

rotamase (siyO) 40.4% 

HP0793 polypeptide deformyiase (def) 413% 
Ribosomat proteins: synthesis and modification 

HP1201 ribosomal protein LI (roll) 62.0% 

HP1200 ribosomal protein L10 (rpllO) 30.4% 

HP1202 ribosomal protein Ltl (rplll) 63.8% 

HP1068 ribosomal protein L11 rnethyttransferase 

(prmA) 38.4% 

HP0084 ribosomal protein L13 (rpt13) 50.0% 

HP1309 ribosomal protein L14 (rpl14) 65.9% 

HP1301 ribosomat protein L15 (rpl15) 42.5% 

HP1312 ribosomat protein L16 (rpl16) 65.4% 

HP1292 ribosomal protein L17 (rp)17) 483% 

HP 1303 ribosomal protein LIS (rpllS) 45.5% 

HP1147 ribosomal protein L19 (rpl19) 603% 

HP1316 ribosomal protein L2 (rpt2) 68.9% 

HP0126 ribosomal protein L20 (rpt20) 543% 

HP0296 ribosomal protein L21 (rpGl) 46.1% 

HP 1314 ribosomal protein L22 (rp!22) 44.9% 

HP 1317 ribosomal protein 153 (rp!23) 317% 

HP 1308 ribosomal protein 124 (rp!24) 525% 

HP0297 ribosomal protein L27 (rpE7) 64.7% 

HP0491 ribosomal protein L28 (rpL28) 417% 

HP1311 ribosomal protein L29 (rpL29) 45.6% 

HP 1319 ribosomal protein L3 (rpO) 418% 

HP0551 rftxjsomal protein L31 (rpBI) 493% 

HP0200 ribosomal protein L32 (rpC2) 417% 

HP 1204 ribosomal protein L33 (rpL33) 55.1% 

HP 1447 ribosomal protein L34 (rpt34) 70.5% 

HP0125 ribosomal protein L35 (rpt35) 603% 

HP 1297 ribosomal protein L36 (rpt36) 816% 

HP 1318 ribosomal protein L4 (rpl4) 40,6% 

HP 1307 ribosomal protein L5 (rptS) 53,1% 

HP 1304 ribosomal protein L6 (rp») 44.4% 

HP 1199 ribosomal protein L7/L12 (rpl7/H2) 663% 

HP05U ribosomal protein L9 (rpO) 39.6% 



HP0399 ribosomal protein SI (rpsl) 30.5% 

HP 1320 ribosomal protein S10 (rpslO) 565% 

HP 1295 ribosomal protein Sll (rpsll) 565% 

HP H97 ribosomal protein Sl2 (rpst2) 793% 

HP 1296 ribosomal protein S 13 (rps 13) 553% 

HP 1306 ribosomal protein S 14 (rpS 14) 683% 

HP1040 nbosomal protein SIS (rps15) 573% 

HPllSl ribosomal protein S16 (rpSl6) 468% 

HP1310 ribosomal protein Sl7 (rpsl7) 55 4% 

HP1244 ribosomal protein S18 (rpsiS) 655% 

HP1315 ribosomal protein S19 (rps 19) 611% 

HP1554 nbosomal protein S2 (rps2) 49.6% 

HPQ076 ribosomal protein S20 (rps20) 41.4% 

HP0562 ribosomal protein S21(rps21) 42.4% 

HP1313 ribosomal protein S3 (rps3) 56 7% 

HP1294 nbosomal protein S4 (rps4) 51,2% 

HP1302 ribosomal protein S5 (rps5) 65.5% 

HP1246 ribosomal protein S6 (rps6) 32.1% 

HPU96 "ribosomal protein S7 (rps7) 625% 

HP 1305 ribosomal prciein S8 (rpsS) 45.0% 

HP0O83 ribosomal protein S9 (rps9) 60.4% 

HP 10*7 ribosome-binding factor A (rbfA) 26.3% 
tRNA modification 

HP1141 metruonykRNA forrnyttransferase (fmt) 375% 

HP 1497 peptidyHRNA hydrolase (pth) 46.6% 

HP0361 pseudouridylate synthase I (hisT) 325% 

HP 1448 ribonudease P. protein component (rnpA) 29.3% 
HP 1062 S-adenosy1meihionine:tRNA 

ribosyltransferase-isornerase (queA) 393% 

HP 1513 selenocystein synthase (selA) 36.2% 
HP1148 tRNA (guanine-N 1 )-methyttransferase (trmD)39.1% 
HP 1415 tRNA deM2)-isopwtenylpyrorjrKjsphate 

transferase (miaA) 30.7% 

HP0281 tRNA-guanine transgrycosylase (tgt) 45.6% 
Translation factors 

HP0247 ATP-dependent RNA helicase. DEAD-box 

family (deaD) 37.7% 

HP0O77 peptide chain release factor RF-l (prfA) 52.6% 

HP0171 peptide chain release factor RF-2 (prfB) 49.6% 

HP1256 ribosome releasing factor (frr) 43.7% 

HP1195 translation elongation factor EF-G (fusA) 67.5% 

HP0177 translation elongation (actor EF-P (efp) 45.1% 

HP 1555 translation elongation (actor EF-Ts (tsf) 43.1% 

HP 1205 translation elongation factor EF-Tu (tufB) 89.5% 

HP 1298 translation initiation (actor EF-1 (infA) 65.3% 

HP 1048 translation initiation factor IF-2 (infB) 45.4% 

HP0124 translation initiation factor IF-3 (infC) 43.4% 

TRANSPORT AND BINDING PROTEINS 
General 

HP0179 ABC transporter. ATP-binding protein 66.7% 

HP0613 ABC transporter. ATP-binding protein 311% 

HP0715 ABC transporter. ATP-binding protein 52.3% 
HP1576 ABC transporter, ATP-binding protein (abc) 485% 
HP 1465 ABC transporter, ATP-binding protein 

(HI 1087) 373% 
HP1220 ABC transporter. ATP-binding protein (yhcG)3t5% 
HP0853 ABC transporter. ATP-binding protein (yheS) 363% 

HP1577 ABC transporter, permease protein (yaeE) 43.1% 

HP0607 acriflavine resistance protein (acrB) 29.7% 

HP1432 histidine and giutamine-rich protein 50.0% 
HP 1427 histidine-rich. metal binding polypeptide 

(hpn) 100.0% 
HP1206 multidrug-resisiance protein (hetA) 265% 
HP 1082 muttid rug-resistance protein (msbA) 32.4% 
HP0600 muttid rug-resistance protein (spaB) 29.7% 
HP1181 muttidrug-efflux transporter 29.1% 
HP0497 sodium- and chkxkle-dependent transporters!. 7% 
HP0498 sodium- and chloride-dependent trans- 
porter 30.8% 
HP0214 sodium-dependent transporter (huNaOC-1) 36.6% 
Amino adds, peptides and amines 
HP0940 amino add ABC transporter, penplasrnic 

binding protein (yckK) 41.5% 
HP0939 amino acid ABC transporter, permease 

protein (ycU) 46.9% 

HP1017 amino acid permease (rocE) 41.7% 

HP0942 D-alanine glycine permease (dagA) 44.5% 
HP0301 dipeptide ABC transporter. ATP-binding 

protein (dppO) 595% 
HP0302 dipeptide ABC transporter, ATP-binding 

protein (dppF) 54.8% 
HP0298 dipeptide ABC transporter, periplasmic 

dipeptide-binding protein (dppA) 39.8% 
HP0299 dipeptide ABC transporter, permease 

protein (dppB) 493% 
HP0300 dipeptide ABC transporter, permease protein 

(dppC) 52.5% 

HP 1506 glutamate permease (gttS) 56.9% 
HP1171 glutamine ABC transporter, ATP-binding 

protein (glnO) 51.9% 
HP 1172 glutamine ABC transporter, periplasmic 

glutamine-binding protein (glnH) 32.2% 
HP 1169 glutamine ASC transporter, permease 

protein (glnP] 27.6% 
HP 1170 glutamine ABC transporter, permease protein 

(glnP) 30.9% 
HP025O oligopeptide ABC transporter, ATP-bintfng 

protein (oppO) 39.1% 
HP 1252 oligopeptide ABC transporter, periplasmic 

oligopeplide-binding protein (oppA) 28.7% 
HP 1251 oligopeptide ABC transporter, permease 

protein (oppBI 59,6% 
HP0251 oligopeptide ABC transporter, 

permease protein (oppC) 31.4% 

HP0819 osmoprotection protein (proV) 38.3% 

HP0818 osmoprotection protein (proWX) 30.4% 

HP0055 proline permease (putP) 514% 

HP0936 proline/betaina transporter (proP) 29.1% 

HP0I33 serine transponer (sdsC) 44.6% 
Anions 

HP0475 molybdertum ABC transporter, ATP-binding 

protein (modD) 38.4% 
HP0473 molybdenum ABC transporter, periplasmic 

motybdate-binding protein (modA) 95.9% 
HP0474 molybdenum ABC transporter, permease 

protein (modB) 28.7% 

HP0313 nitrite extrusion protein (narK) 23.6% 

HP1491 phosphate permease 343% 
Carbohydrates, organic alcohols and adds 
HP0143 2-oxoglutarate/malate transJocator (SOOT 1 ) 37.0% 

HP1091 alpha-ketoglutarate permease (kgtP) 45.9% 
HP0724 anaerobic C4-d*carboxylate transport 

protein (dcuA) 53.8% 

HP 1174 gjucose/galactose transporter (gluP) 53.6% 

HPOMt L-lactate permease (tetP) 65.5% 

HP0140 L-lactate permease OctP) . 68.7% 



Cations 

HP0791 cadmium-transporting ATPase. P-type 

(cadA) 975% 
cation efflux system protein (czcA) 373% 
HP 1328 cation efflux system protein (czcA) 28.9% 
HP 1329 cation efflux system protein (czcA) 313% 
HP 1503 cation-transportine ATPase. P-type (copA) 303% 
HP1073 copper on binding protein (copP) 92.4% 
HP1072 copper-transporting ATPase. P-type (copA) 93.9% 
HP047t glutatruone-reguiated poiasstum-efttux system 

protein (kefS) 993% 
HP0687 iron(ll) transport protein (leoB) 336% 
HP 1561 iron(lll) ABC transporter, periplastic iron- 
binding protein (ceuE) 275% 
HP1562 ironflll) ABC transporter, periptasmic iron- 
binding protein (ceuE) 282% 
HP0888 ironfJII) dkwate ABC transporter. ATP-binding 

protein (fecE) 34.4% 
HP0689 ironttll) dicitrate ABC transporter, permease 

protein (fecD) 383% 
HP0686 tron(ltl) dtetrate transport protein (fecA) 29.7% 
HP0807 irontlll) dicitrate transport protein (fecA) 
HP 1*00 iron(ril) dkatrate transport protein (tecA) 
HP 1344 magnesium and cobatt transport protein 
(corA) 

HP 1183 NA«7H+ antiporter (napA) 
HP1552 Na*7H+ antiporter (nhaA) 
HP1077 nickel transport protein (nixAJ 
HP0490 putative potassium channel protein, 
putative 

Nucleosides, purines and pyrimidines 
HP 1290 nicotinamide mononucleotide transporter 
(pnuC) 

HP1180 pyrimidine nucleoside transport protein 
(nupC) 

Other 



263% 
263% 



492% 
98.7% 



25.7% 



280% 
323% 



HP0876 


iron-regulated outer membrane protein 






(trpB) 




27.6% 


HP0915 


iron-regulated outer memb 


-ane protein 






(frpB) 




28.1% 


HP0916 


iron-regulated outer membi 


arte protein 






(frpB) 




285% 


HP1129 


biopolymer transport protei 


n(exbD) 


29.7% 


HP1130 


biopolymer transport protei 


n (exbB) 


335% 


HP 1339 


biopotymer transport protein (exbB) 


465% 


HP1340 


biopolymer transport protei 


n (exbO) 


355% 


HP1445 


biopotymer transport protei 


m (exbB) 


455% 


HP1446 


bioporymer transport protei 


in (exbD) 


362% 


HP 151 2 


iron-regulated outer membi 


rane protein 






(frpB) 






HP0653 


nonheme irorvcoma'ining f< 


jrritin (pfr) 


994% 


HP 1341 


siderophore-mediated iron transport protein 




(tonB) 




372% 



OTHER CATEGORIES 
General 



HP0924 
HP1034 
HP 1000 
HP 1139 
HP0827 



4-oxalocrotonate tautomerase (dmpl) 
ATP-binding protein (ylxH) 
PARA protein 
SpoOJ regulator (soj) 

ss-DNA binding protein 12RNP2 precursor 
and atypical conditions 
general stress protein (etc) 
gerCZ protein (gerC2) 
heat shock protein (htpX) 
heat shock protein B (ibpB) 
invasion protein (invAj 
nickel-cobart-cadrnium resistance protein 
(nccB) 

small protein (smpB) 

stationary-phase survival protein (surE) . 
virulence associated protein D (vapD) 
virulence associated protein D (vapD) 
virulence associated protein homotog 
(vacB) 

virulence factor mviN protein (mviN} 
Colicin+elated functions 
HP 1126 coticin tolerance-tee protein (tolB) 
HP0428 phage /co&tin/teBurite resistance cluster 

terY protein 
Drug and analog sensitivity 
HP1431 16S rRNA (adenosine-N6,N6-Ki rr <ethyl- 
transferase (ksgA) 
membrane fusion protein (mtrC) 
modulator of drug activity (mda66) 
phenytacrytic acid decarboxylase 
tetracycline resistance protein tetA(P), 
putative 
Transposon-related functions 
HP 1008 IS200 insertion sequence from SARA17 
IS200 insertion sequence from SARA17 
IS605 transposase (tnpA) 
IS605 transposase (tnpA) 
IS605 transposase (tnpA) 
1S605 transposase (tnpA) 

15605 transposase (tnpA) 

15606 transposase (tnpBJ 
IS605 transposase (tnpB) 
IS605 transposase (tnpB) 
IS605 transposase (tnpB) 
IS605 transposase (tnpB) 
transposase-like protein. PS31S 
transposa3e-like protein. PS3IS 



HP 1496 
HP1483 
HP0927 
HP0280 
HP 1228 
HP0970 

HP 1444 
HP0930 
HP0315 
HP0967 
HP1248 

HP0885 



HP0606 
HP0630 
HP 1476 
HP 1165 



HP0414 



HP0998 
HP 1096 
HP1535 
HP0437 
HP0989 
HP0997 
HP1095 
HP1534 
HP0438 
HP0413 
HP1007 
Other 
HP0739 



37.7% 
363% 
29.7% 
47.4% 
46.8% 

265% 
332% 
32£% 
272% 
382% 

211% 
42.1% 
37.7% 
702% 
28.9% 

36X1% 
335% 

25.7% 

25.6% 



35.5% 
242% 
623% 
39.7% 

27.0% 

335% 
33.9% 
972% 
972% 
972% 
972% 
972% 
93.4% 
93.4% 
83.4% 
83.4% 
93.4% 
33.6% 
342% 



2-hydroxy-6-oxoher*a-2,4-dienoate 
hydrolase 



HYPOTHETICAL 


Genera/ 




HP0831 


conserved 


HP0066 


conserved 


HP0269 


conserved 


HP0312 


conserved 


HP1321 


conserved 


HP 1430 


conserved 


HP 1507 


conserved 


HP 1567 


conserved 


HP1026 


conserved 


HP0022 


conserved 


HP0189 


conserved 




protein 


HP0226 


conserved 




protein 


HP0228 


conserved 






HP0234 


conserved 




protein 



hypothetical 
hypothetical 
hypothetical 
hypothetical 
hypothetical 
hypothetical 
hypothetical 
hypothetical 
hypothetical 
hypothetical 

hypothetical 

hypothetical 

hypothetical 

hypothetical 



ATP binding protein 323% 
ATP-binding protein 34.7% 
ATP-binding protein 37.7% 
ATP-binding protein 34.1% 
ATP-binding protein 308% 
ATP-binding protein 38.1% 
ATP-binding protein 516% 
ATP-binding protein 405% 
heticase-like protein 352% 
integral membrane 

30-8% 

integral membrane 

43.1% 

integral rnembrane 

27.6% 

integral membrane 

432% 

integral membrane 

32.4% 



HP0258 conserved hypothetical integral membrane 

protein 32.7% 
HP0284 conserved hypothetical integral membrane 

protein 292% 
HP0362 conserved hypothetical integral membrane 

protein 288% 
HP0415 conserved hypothetical tfoegral membrane 

protein 44.4% 
HP0467 conserved hypothetical integral membrane 

protein 1000% 
HP0571 conserved hypothetical integral membrane 

protein 29.5% 
HP0644 conserved hypothetical integral membrane 

protein 303% 
HP0677 conserved 'hypothetical integral membrane 

protein 285% 
HP0693 conserved hypothetical integral membrane 

protein 46.7% 
HP0718 conserved hypothetical integral membrane 

protein 335% 
HP0737 conserved hypothetical integral membrane 

protein 333% 
HP075B conserved hypothetical integral membrane 

protein 475% 
HP0759 conserved hypothetical integral membrane 

protein . . 311% 

HP0787 conserved hypothetical integral membrane 

protein 252% 
HP0851 conserved hypothetical integral membrane 

protein 373% 
conserved hypothetical integral membrane 
protein 363% 
HP0946 conserved hypothetical integral membrane 

protein 355% 
HP0952 conserved hypothetical integral membrane 

protein 385% 
HP0983 conserved hypothetical integral membrane 

protein 325% 
HP 1044 conserved hypothetical integral membrane 

protein 30.6% 
HP 1061 conserved hypothetical integral membrane 

protein 35.0% 
HP1080 conserved hypothetical integral membrane 

protein 44X3% 
HP 1162 conserved hypothetical integral membrane 

protein 275% 
HP1175 conserved hypothetical integral membrane 

protein 406% 
HP 1184 conserved hypothetical integral membrane 

protein 235% 
HP1185 conserved hypothetical integral membrane 

protein 55.5% 
HP1225 conserved hypothetical integral membrane 

protein 31.6% 
HP 1234 conserved hypothetical integral membrane 

protein 29.0% 
HP1235 conserved hypothetical integral membrane 

protein 30.9% 
HP1330 conserved hypothetical integral membrane 

protein 417% 
HP1331 conserved hypothetical integral membrane 

protein . 33.6% 
HP1343 conserved hypothetical integral membrane 

protein 49.1% 
HP1363 conserved hypothetical integral membrane 

protein 33.1% 
HP 1407 conserved hypothetical integral membrane 

protein 22.4% 
HP1466 conserved hypothetical integral membrane 

protein 30.9% 
HP1484 conserved hypothetical integral membrane 

protein 412% 
HP1466 conserved hypothetical integral membrane 

protein 23.8% 
HP 1487 conserved hypothetical integral membrane 

protein 30.7% 
HP1509 conserved hypothetical integral membrane 

protein 343% 
HP1548 conserved hypothetical integral membrane 

protein 30.6% 
HP0138 conserved hypothetical iron-sulfur protein 412% 
HP1438 conserved hypothetical Epoprotein 32.0% 
HP0151 conserved hypothetical membrane protein 218% 
HP0575 conserved hypothetical membrane protein 385% 
HP1258 conserved hypothetical mitochondrial 

protein 4 232% 
HP1492 conserved hypothetical nrfLMike protein 482% 
HP0032 conserved hypothetical protein 37.0% 
HP0035 conserved hypothetical protein 34.1% 
HP0086 conserved hypothetical protein 28.7% 
HP0094 conserved hypothetical protein 295% 
HP0100 conserved hypothetical protein 32.0% 
HPO102 conserved hypothetical protein 293% 
HP0105 conserved hypothetical protein 39.7% 
HP0117 conserved hypothetical protein 342% 
HP0162 conserved hypothetical protein 36.7% 
HP0216 conserved hypothetical protein 33.9% 
HP0233 conserved hypothetical protein 30.5% 
HP0248 conserved hypothetical protein 30.7% 
HP0274 conserved hypothetical protein 38.5% 
HPC265 conserved hypothetical protein 305% 
HP0309 conserved hypothetical protein 313% 
HP0310 conserved hypothetical protein 33.7% 
HP0318 conserved hypothetical protein 472% 
HP0328 conserved hypothetical protein 30.7% 
HP0334 conserved hypothetical protein 30.8% 
HP0347 conserved hypothetical protein 318% 
HP0373 conserved hypothetical protein 314% 
HP0374 conserved hypothetical protein 24.7% 
HP0388 conserved hypothetical protein 39.8% 
HP0395 conserved hypothetical protein 39.9% 
HP0396 conserved hypothetical protein 33.7% 
HP0419 conserved hypothetical protein 45 6% 

HP0447 conserved hypothetical protein 382% 
HP0465 conserved hypothetical protein 955% 
HP0466 conserved hypothetical protein 95.7% 
HP0468 conserved hypothetical protein 97.1% 
HP0469 conserved hypothetical protein 95.1% 
HP0496 conserved hypothetical protein 992% 
HP0507 conserved hypothetical protein 372% 
HP0519 conserved hypothetical protein 953% 
HP0552 conserved hypothetical protein 375% 
HP0553 conserved hypothetical protein 30.0% 
HP0639 conserved hypothetical protein 410% 
HP0654 conserved hypothetical protein 325% 
HP0656 conserved hypothetical protein 36X3% 
HP0707 conserved hypothetical protein 40.1% 
HP0709 conserved hypothetical protein 49.6% 
HP0710 conserved hypothetical protein 33.7% 
HP0716 conserved hypothetical protein 30-2% 



XP072B conserved hypothetical pro:e r 
HP0734 conserved hypothetical prote - 
HP0741 conserved hypothetical prcte-- 
HP0745 conserved rtypothetical prcie r 
HP0747 conserved hypothetical proter. 
HP0760 conserved rtypothetical prote.n 
HP0810 conserved hypothetical proter 
HP0813 conserved hypothetical prater-. 
HP0823 conserved hypothetical protein 
HP0860 conserved hypothetical preten 
HP0890 conserved fiypotheticat pro:em 
HP0891 conserved hypothetical pro:e.n 
conserved rtypothetical prote n 
conserved hypothetical prater, 
conserved hypothetical prefe r, 
conserved hypothetical prote r. 
conserved hypothetical crcte r 
conserved hypothetical prote r 
conserved hypothetical prate r 
conserved hypothetical prote r 
conserved hypothetical pro:e r 
conserved hypothetical pro:e r. 
conserved hypothetical prote n 
conserved hypothetical prote n 
conserved hypothetical prote n 
conserved hypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein 
conserved hypothetical protem 
conserved hypothetical pro:e : n 
conserved hypothetical pro:e;n 
conserved hypothetical protein, 
conserved hypothetical protem 
conserved hypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein 
conserved rtypothetical protein 
conserved rtypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein 
conserved rtypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein 
conserved rtypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein 
conserved rtypothetical protein 
[ptasmid pHPMlSO) 

conserved hypothetical secreted protein 
conserved hypothetical secreted protein 
conserved hypothetical secreted protein 
conserved hypothetical secreted protein 
conserved hypotheticat secreted protein 
conserved rtypothetical secreted protein 
conserved hypothetical secreted protein 
conserved hypothetical secreted protein 
conserved hypothetical secreted protein 
conserved hypothetical secreted protein 
conserved hypothetical secreted prate n 
conserved hypothetical secreted protem 
HP0977 conserved hypothetical secreted protein 
HP0980 conserved hypothetical secreted protein 
HP1075 conserved hypothetical secreted protein 
HP1098 conserved hypothetical secreted protein 
HP1117 conserved rtypothetical secreted protein 
HP1216 conserved rtypothetical secreted protein 
HP1285 conserved hypothetical secreted protein 
HP 1286 conserved hypothetical secreted protein 
HP 1464 conserved hypothetical secreted protein 
HP1468 conserved hypothetical secreted protein 
HP1551 conserved rtypothetical secreted protein 



HP0894 
HP0926 
HP0934 
HP0956 
HP0959 
HP0966 
HP0975 
HP 1020 
HP 1037 
HP 1046 
HP 1049 
HP1066 
HP 1149 
HP 1160 
HP 1182 
HP1214 
HP1221 
HP 1240 
HP 1242 
HP 1259 
HP 1284 
HP129I 
HP1335 
HP 1337 
HP 1338 
HP 1394 
HP140I 
HP1413 
HP1414 
HP1417 
HP 1423 
HP 1426 
HP1428 
HP 1443 
HP1449 
HP 1453 
HP 1459 
HP1504 
HP 1510 
HP1533 
HP1570 
HP 1573 
HP 1587 
HP1588 
HP1589 
HP0713 

HP0028 
HP0139 
HP0160 
HP0190 
HP0211 
HP0235 
HP0257 
HP0320 
HP0506 
HP0518 
HP0785 



^3% 

36l% 
310% 
32 5% 
278% 
52 1% 

32 2% 

33 6% 
391% 
39 6% 
30.7% 
33 6% 
30 2% 
.«::% 

. • <% 

5% 

sK,a% 

326% 
39.7% 
413% 
24.7% 
34.7% 
34.6% 
215% 
424% 

22 5% 
'42.3% 

44 6% 

36 8% 
26 3% 
n 9% 

:?2% 
36 2% 
33 6% 
275% 
416% 
27.4% 
23.7% 
40.3% 
400% 
378% 
379% 
39 0% 
268% 
301% 

23 9% 
306% 

.-"j.4% 

■;j5% 

422% 
390% 
32 0% 
35.1% 

41.6% 
42.1% 
371% 
30.6% 
314% 
243% 
315% 
292% 
36.4% 
29.6% 
W9% 
£i.G% 

::9.t% 
294% 
57.4% 
42.9% 
27.0% 
32.3% 
319% 
38.0% 
37.5% 
274% 
29 8% 
42.7% 



UNKNOWN 

General 

HP0390 

HP1193 

HP0872 

HP0207 

HP0136 

HP0485 

HP1104 

HP0931 



HPO303 
HP0834 
HP0480 
HP 1489 
HP0405 
HP0221 
HP0658 



HP0322 
HP0625 
HP0431 
HP0624 , 

HP0377 



adhesin-thtol peroxidase (tagD) ?3.3% 

atdo-keto reductase, putative ^ 6% 

alkylphosphonate uptake protein (phnA) 6M% 

ATP-binding protein (mpr) 38 9% 

bacterioferritin comigratory protein (bep) 35 5% 

catalase-fiKe protein 30.8% 
cinnarrryl-alcohot dehydrogenase 

ELI3-2 (cad) 44.0% 

exonuclease VIHike protein (xseA) 42.5°b 

GTP-binding protein (gtp i ) 48.' % 

GTP-binding protein (obg) 48.2% 

GTP-binding protein horootogue (yphC) 36.7% 

GTP-binding protein. fusA-homoiog (yihK) 54 1 <*> 

lipase-like protein 21.7% 

nifS-like protein 27.3% 

niflMike protein 373% 

PCTll2-like protein 45 4% 

pis protein (pfs) 3- E^o 

poly E-rich protein &:.ry 

protein E (gcpE) £ '' 7 ^ 

protein phosphatase 2C homolog (ptci) 30-^% 
solute-binding signature end mitochondrial 

signature protein (aspB) 26 *>o 
thiol:disutftde interchange protein (dsbC). 

putative 26.J^ 




Nuclease, iceAl, and its associated DNA adenine methyltrans- 
$e (M. HypI) genes 21,22 . In addition to the complete systems, 
adenine-specific, and four cytosine-specific methyltrans- 
j£$es, and one of unknown specificity were found. Each of these 
[jjf an adjacent gene with no database match, suggesting that they 
nay function as part of restriction -modification systems. 

Inscription and translation 

.kllhough analysis of gene content suggests that H. pylori has a basic 
iscriptional and translation^ machinery similar to that of E. coli> ' 
Jteresting differences are observed. For example, no genes for a 
italytic activity in tRNA maturation (rnd, rph, or rnpB) were 
Jjitified and of the three known ribonucleases involved in mRNA 
tion, only polyribonucleotide phosphorylase was found. 
?enty-one genes coding for 18 of the 20 tRNA synthetases nor- 
mally required for protein biosynthesis were found. 
1-As in most other completely sequenced bacterial genomes, the 
Jcne for glutaminyl-tRNA synthetase, glnS, is missing, and the 
' existence 0 f a transamidation process is assumed. It is also possible 
^Kat the product of the second glutamyl-tRNA synthetase gene, gltX> 
'present in H. pylori, may have acquired the glutaminyl-tRNA 
Synthetase function. H. pylori provides the first example of a 
bacterial genome apparently lacking an asparaginyl-tRNA synthe- 
tase gene, asnS. A transamidation process to form Asn-tRNAAsn 
Irom Asp-tRNAAsn has been reported for the archaeon Haloferax 
%lcaniP and may also operate in H. pylori. Most intriguing, 
however, is the finding that in H. pylori the genes encoding the 0 
;ahd P' subunits of RNA polymerase are fused. In all studied 
f rbkaryotes the two genes are contiguous, but separate, and are 
part of the same transcriptional unit. Whether this gene fusion in H. 
pylori results in a fused protein, or whether the transcriptional or 
jtranslational product of the fusion is subject to splicing, is currently 
iiot known. It is worth noting that an artificial fusion of the E. coli 

'JFlgure 2 Circular representation of 
•the H. pylori 26695 chromosome. 
Outer concentric circle: predicted 
awing regions on the plus strand 
■classified as to role according to the 
colour code in Fig. 1 (except for 
Jinknowns and hypothetical, which 
ere in black). Second concentric 
^circle: predicted coding regions on 
3he minus strand. Third and fourth 
jponcentric circles: IS elements (red) 
;end other repeats (green) on the plus 
end minus strand, respectively. Fifth 
arid sixth concentric circles: tRNAs 
(blue), rRNAs (red), and sRNAs 
(green) on the plus and minus 
strand, respectively. 
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rpoB and rpoC genes is f viable and results in a transcriptional 
complex, which has the same stoichiometry as the native complex 
(K. Severinov, personal communication). 

Adhesion and adaptive antigenic variation 

Most pathogens show tropism to specific tissues or cell types and 
often use several adherence mechanisms for successful attachment. 
H. pylori may use at least five different adhesins to attach to gastric 
epithelial cells 5 . One of them, HpaA (HP0797), was previously 
identified as a lipoprotein in the flagellar sheath and outer 
membrane 5,23 . In addition to the HpaA orthologue, we have identi- 
fied 19 other lipoproteins. Few have an identifiable function, but 
some are likely to contribute to the adherence capacity of the 
organism. 

Two adhesins 24 " 26 , one of which mediates attachment to the 
Lewis b histo-blood group antigens, belong to the large family of 
outer membrane proteins (OMP) (Fig. 3) (T. Boren and R. Haas, 
personal communication). It is conceivable that other members of 
these closely related proteins also act as adhesins. Given the large 
number of sequence-related genes encoding putative surface- 
exposed proteins, the potential exists for recombinational events 
leading to mosaic organization. This could be the basis for antigenic 
variation in H. pylori and an effective mechanism for host defence 
evasion, as seen in M. genitalium 27 . 

At least one other mechanism for antigenic variation could 
operate in H. pylori. The DNA sequence at the beginning of eight 
genes, including five members of the OMP family, contain stretches 
of CTor AG dinucleotide repeats (Table 3a). In addition, poly(C) or 
poly(G) tracts occur within the coding sequence of nine other genes 
(Table 3b). Slipped-strand mispairing within such repeats are 
documented features of one mechanism of genotypic variation 28,29 . 
These mechanisms may have evolved in bacterial pathogens to 
increase the frequency of phenotypic variation in genes involved in 
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Figure 3 Multiple sequence alignment of members of the outer membrane 
protein family of H. pylori. These proteins were identified as OMPs based on the 
characteristic alternating hydrophobic residues at their carboxy termini. All 
members of this family have one domain of similarity at the amino-terminal end 
and seven domains of similarity at their carboxy-terminal end. Note that the first 11 
of these OMPs share extensive similarity over their entire length. Four of the 
OMPs were identified as porins (Hops) based on identity to published amino- 
terminal sequences, represented at the top of the alignment 50 . The most likely 
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candidate for HopD is HP0913, which has 15 matches to the first 20-residue N-j 
terminal peptide sequence 50 . These differences may be due to strain variability 
The program Signal-P 48 was used to identify cleavage sites and signal peptide^ 
(underlined). Four of the OMPs have TTG start codons (HP1156, HP0252, HP1113, 
HP0796). Numbers embedded in the sequences represent amino acids omittej 
from the alignment The star symbols indicate that HP722, HP725 and Hf*~3 
proteins contain a frameshift in their signal-peptide-coding region. These frarne 
shifts are associated with the presence of dinucleotide repeats (Table 3). -J 
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interactions with their hosts 28 . Such 'contingency' genes 
"e surface structures like pilins, lipoproteins or enzymes that 
'uce lipopolysaccharide molecules 28 . Our analysis suggests that 
*-*>venteen genes reported in Table 3a,b belong to this category 
ft thus may provide an example of adaptive evolution in H. pylori 
""icnotypic variation at the transcriptional level may also operate 
tfii'pylori. Examples of repetitive DNA mediating transcriptional 
^frol have been documented by the presence of oligonucleotide 
* : ts - m promoter regions 29 . Homopolymeric tracts of A or T in 
tcntial promoter regions of eighteen genes were found, including 
t members of the OMP family (Table 3c). 

"^ulence of individual H. pylori isolates has been measured by 
lability to produce a cytotoxin-associated protein (CagA) and 

ft" 

cagll 



an active vacuolating cytotoxin ( VacA) 5 . The cagA gene, though not 
a virulence determinant, is positioned at one end of a pathogenecity 
island containing genes that elicit the production of interleukin 
(IL)-8 by gastric epithelial cells 1130 . Consistent with its more virulent 
character, H. pylori strain 26695 contains a single contiguous PAI 
region" (Fig. 4). 

VacA induces the formation of acidic vacuoles in host epithelial 
cells, and its presence is associated epidemiologically with tissue 
damage and disease 31 . VacA may not be the only ulcer-causing factor 
as 40% of H. pylori strains do not produce detectable amounts of the 
cytotoxin in vitro 5 . Sequence differences at the amino terminus and 
central sections are noted among VacA proteins derived from Tox + 
and Tox" strains 31 . This Tox + H. pylori strain contains the more 
toxigenic Sla/ml type cytotoxin and three additional large proteins 
with moderate similarities to the carboxy-terminal end of the active 



11638 «==C*=S^<i=^4= : ==t>3 < 
■WF» -* 1 * 4 8 * 7 * * to 11 « II 
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cagTccgS 

cag Pathogenicity Island 
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tnpA fnpfi tnpA tnpB 



cagl 



■ ... 

re 4 Comparison between the Cag pathogenicity islands of the sequenced 
^ 26695 and the NCTC11638 strain. The twenty nine ORFs of the contiguous 
I strain 26695 are represented together with the corresponding ORFs from 
"-^'present in NCTC11638 (AC000108 and U60176). The PAI in NCTC11638 is 
ed by the IS 605 elements into two regions, cagl and cagll. The PAI in 
M1638 is flanked by a 31 -bp (TTACAATTTGAGCCCATTCTTTAGCTTGTTTT) 
* repeat (vertical arrows) as described 1 '. Some of the genes encode proteins 
^Similarity to proteins involved either in DNA transfer (Vir and Tra proteins) or in 
rtof a toxin (Ptl protein) 10 . However, these genes do not have the conserved 
guous arrangement found in the VirB, Tra and Ptl operons, suggesting that 
"TAI is not derived from these systems. Most genes of the PAI have no 
ase match, contrary to a previous suggestion". Thirteen of the proteins have 
Q& peptide (squiggle line}, three of them with a weaker probability (squiggled 
.The average length of the signal peptides is 25 amino acids, suggesting 
Is PAI is of Gram-negative origin. Eight proteins are predicted to have at 
vo membrane-spanning domains and to be integral membrane proteins 



(IM) 47 . Although the two PAI are -97% identical at the nucleotide level, there are 
several notable and perhaps biologically relevant differences between the two 
sequences. Four of the genes differ in size. In the PAI of strain 26695, HP 520 and 
521 are shorter, whereas HP523 is longer, and HP 527 actually spans both ORF13 
and 14, In addition, the N-terminal part of HP527 is 129 amino acids longer than the 
corresponding region in ORF14. HP548/549 contains a frameshift and is therefore 
probably inactive in strain 26695. The stippled box preceding ORF13 represents 
an N-terminal extension not annotated in the Genbank entry for the PAI of 
NCTC11638. The 'x* indicates ORFs that are neither GeneMark-positive nor 
GeneSmith-positive. so were not included in our gene list. However, these 
ORFs may be biologically significant. We do not represent cagR as an ORF, 
because it is completely contained within ORFQ, and is GeneMark-negative. 




1,306 aa 



HP0887 
(VacA) 



37K 



58K 



hydrophilic 
domain 



• 33K 
cleavage hydrophobic 
domain domain 



\B Conserved domains of VacA and related proteins. HP887 is the 
ating cytotoxin (vacA) gene from H. pylori 26695 strain. HP610, HP922 and 
.are related proteins. Blocks of aligned sequence and the length of each 
".are shown. Arrows designate the extents of each VacA domain. The 
hilic domain (blue boxes) contains the site In VacA at which the N-terminal 
Vis cleaved into 37K and 58K fragments. The putative cleavage site 
.QQNS) differs from that of three cytotoxic strains (CCUG 1784, 60190, G39; 



AKNDKXES) and is not conserved in the other three VacA-related proteins. The 
cleavage domain (black boxes) of VacA contains a pair of Cys residues 60 
residues upstream from the site at which the C terminus is cleaved. These 
residues are not conserved in the other three proteins. The 33K O-terminal 
hydrophobic domain (red boxes) in VacA Is thought to form a pore through which 
the toxin is secreted. The other three proteins show 26-31% sequence similarity 
to VacA in this region. The other coloured boxes represent regions of similarity. 
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cytotoxin (-26-31%) (Fig. 5). However, they lack the paired- 
cysteine residues and the cleavage site required for release of the 
VacA toxin from the bacterial membrane 31 (Fig. 5). We propose that 
these proteins may be retained on the outside surface of the cell 
membrane and contribute to the interaction between H. pylori and 
host cells. 

The surface-exposed lipopolysaccharide (LPS) molecule plays an 
important role in H. pylori pathogenesis 32 . The LPS of H. pylori is 
several orders of magnitude less immunogenic than that of enteric 
bacteria 33 and the O antigen of many H. pylori isolates is known to 
mimic the human Lewis x and Lewis 7 blood group antigen 32 . Genes 
for synthesis of the lipid A molecule, the core region, and the O 
antigen were identified. Two genes with low similarity to fucosyl- 
transferases (HP379, HP651) were found and may play a role in the . 
LPS-Lewis antigen molecular mimicry. Our analysis also suggests 
that three genes, two glycosyltransferases (HP208 and HP619) and 
one fucosyltransferase (HP379), may be subject to phase variation 
(Table 3a, b). 

As with other pathogens, H. pylori probably requires an iron- 
scavenging system for survival in the host 5 . Genome analysis 
suggests that H. pylori has several systems for iron uptake. One is 
analogous to the siderophore-mediated iron-uptake fee system of E. 
col? A t except that it lacks the two regulatory proteins (FecR and Feci) 
and is not organized in a single operon. Unlike other studied 
systems, H. pylori has three copies of each of fecA, exbB and exbD. 
A second system, consisting of a /eoB-like gene without feoA, 
suggests that H. pylori can assimilate ferrous iron in a fashion 
similar to the anaerobic feo system of K coll Other systems for iron 
uptake present in H. pylori consist of the three frpB genes which 
encode proteins similar to either haem- or lactoferrin-binding 
proteins. Finally, H. pylori contains NapA, a bacterioferritin 34 , and 
Pfr, a non-haem cytoplasmic iron-containing ferritin used for 
storage of iron 35 . The global ferric uptake regulator (Fur) character- 
ized in other bacteria is also present in H. pylori. Consensus 



sequences for Fur-binding boxes were found upstream of tw \ 
genes, the three frnB genes and/wr. 

H. pylori motility is essential for colonization 36 . It enables 
bacterium to spread into the viscous mucous layer coverm* 
gastric epithelium. At least forty proteins in the H. pyl 0r i a t ~ j 
appear to be involved in the regulation, secretion and assembly! 
the flagellar architecture. As has bene reported for the flaA and \ 
genes, we identified sigma 28 and sigma 54-like promoter elemei 
upstream of many flagellar genes, underscoring the complexii 
the transcriptional regulation of the flagellar regulon 5 . 



Acidity, pH and acid tolerance 

H. pylori is unusual among pathogenic bacteria in its ability] 
colonize host cells in an environment of high acidity. As it enters! 
gastric environment by oral ingestion, the organism is transieij 
subjected to the extreme pH of the lumen side of the gastric mucq 
layer (pH —2). The survival of H. pylori in acidic environments 
probably due to its ability to establish a positive foside-membra 
potential 37 and subsequently to modify its microenvironme 
through the action of urease and the release of factors that inhl 
acid production by parietal cells 5 . A switch in membrane polan 
provides an electrical barrier that prevents the entry of proto 
(H + ). A positive cell interior can be created by the active extrusion] 

imnc r»r Viv a nrntnn A i fft i c i /■> r> r\r\+ onf-io! T'V* a lot-fa- I 



anions or by a proton diffusion potential. The latter model 



appe| 



more likely as no clear mechanism for electrogenic anion efflux 
apparent in the genome. A proton diffusion potential would requS 
the anion permeability of the cytoplasmic membrane to be low ar 
thus far, only three anion transporters have been identified. Ho\ 
ever, it remains to be determined whether anion conductances 8 
associated with other proteins: the MDR-like transporters (HP6C 
HP1082 and HP1206) or hypotheticals. Although it has be. 
suggested that proton-translocating P-type ATPases could media 
survival in acid conditions by the extrusion of protons from 6 
cytoplasm 38 , this idea is not supported by the identified transport 

j 
4 



Table 3 Homopolymeric tracts and dinucleotide repeats in H. pylori 



Poly(A) or Poiy(T) tracts in 5' intergenic regl 
i 

Poly(A) ! 

Poly(A) ; 

No ; 

PolyfT) i 

PolyfT) \ 
No 

Poly{A) j 

No ! 

Nucleotide sequence at the beginning of HP0722 showing the CT dinucleotide repeat and the poly T tract The putative ribosome binding site is shown in green. Translati 
starting at the designated methionine leads to a truncated product. The addition or deletion of two CT repeats, by 'slipped-strand mispairing', will restore the frame, j 

CCAAAAATC 1 1 1 I I 1 1 I I I I TTTTTG AAATCCAATAAATTTATG GTAAAGT-3 7b pTiTAC AATAAAAAAATTACTTTAAG G AAC ATTT ' 

TATGAAAMGACMTTCT ACTCTCTCTCTCTCTCT I I 1 1 KjIgAGCGCCGGCT i 

YE K DNS T L S C 5~~ L~A S S L LHAED N G F FVSAG Y 1 
MKKTI LL SLSLSLHRSCTLKTTA F L * 



HP no. 


ID 


No. of repeats 


Gene status 


9 


OMP 


11 CT 


Off 


208 


glycos. transf. 


11 AG 


Truncated 


638 


OMP 


6 CT 


On 


722 


OMP 


8 CT 


Off 


725 


OMP 


6 CT 


Off 


744 


Hypo 


9 AG 


Truncated 


896 


OMP 


11 CT 


On 


1417 


Cons. Hypo 


9 AG 


Truncated 



(b) Homopolymeric poly(C) and poly(G) tracts within coding sequence 



HP no. ID Tract length Genestat 

58 Hypo C15 Off ' 

217 Hypo G12 On \ 

379 fucosyl transf. ' C13 On j 

464 TypelR C15 On , 

619 glycos. transf. C13 Truncate 

651 Hypo C13 On \ 

1353 Hypo C15 Truncate 

1471 TypellS-R G14 On f 

1522 Methyl ase G12 Truncate 

Genes possibly regulated by homopolymeric po!y(A) or potyfT) tracts in 5' intergenic regions 4 

HP no. ID Tract HP no. ID .Tract HP no. ID Tr f 

9 OMP A14 25 OMP T15 208 rfaJ A 

227 OMP T14 . 228 IMP A14 349 pyrG T « 

350 IMP A15 547 cagA A14 629 Hypo T - 

722 OMP T16 - 725 OMP T14 733 Hypo T 

876 frpB T16 896 OMP A14 912 OMP T | 

1342 OMP A14 1400 fecA A16 * 

, . j 
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JfThe P-type ATPase sequences in H. pylori (copAP> HP791, and 
03) are more closely related to divalent cation transporters 
o ATPases with specificity for protons or monovalent cations. 
c?of them, HP0791, is involved in Ni 2+ supply, an essential 
|>onent of urease activity 39 . The others may be involved in the 
nation of toxic metals from the cytoplasm and not in pH 
"ation. 



Additional mechanisms of pH homeostasis may well contribute 
to H. pylori survival. A change in protein content observed in 
response to a shift of extracellular pH from 7.5 to 3.0 suggests the 
presence of an acid-inducible response 40 . Although H. pylori lacks 
most orthologues of the genes that are acid-induced in R coli and 
Salmonella typhimurium, including the amino-acid decarboxylases 
and formate hydrogen lyase, certain virulence factors, outer membrane 



ACID TOLERANCE ENERGY PRODUCTION BIOSYNTHETIC PATHWAYS DEGRADATIVE PATHWAYS j 
AND FERMENTATION 




Zn2+. 
Co2+, 
"Cd2* 



Rgure 6 Solute transport and metabolic pathways of Helicobacter pylori. 
rarisporters identified by sequence comparisons are characteristc of Gram- 
native bacteria. Colours correspond to transport role categories defined by 
Jey 15 : blue, amino acids, peptides and amines; red, anions; yellow, carbohy- 
"es, organic alcohols and acids; green, cations; and purple, nucleosides, 
^rines and pyrimidines. Numerous permeases (ovals) with specificity for 
mino acids {fecE, proP, dagA, glt$, putP and sdaC) or carbohydrates (SOD/77, 
VP. factP, cduA, kgtP) import organic nutrients. Structurally related permease 
ins maintain ionic homeostasis by transporting HPOi" {H1 1604), NOff iparK), 
JJ Na* (nM, napA). Primary active-transport systems, independent of the 
:h cycle, are also apparent Included in this group are ATP-binding protein- 
ssette (ABC) transporters (composite figures of 2 diamonds, 2 circles, 1 
I) for the uptake of oligopeptides [oppACD), dipeptides {dppABCDF), proline 
WW), glutamine iglnHMPQ), molybdenum 1/nodABD). and iron III (fecED), P- 
\ ATPases that extrude toxic metals from the cell (copAP and cadA), and the 
gathione-regulated potassium-efflux protein (JcefB). Transporters for the accu- 
sation of ionic cofactors are encoded by nixA (Ni 2 * for urease activation), corA 
So 2 * for phosphohydrolases, phosphotransferases, ATPases) and feoB (Fe 2 * 



• -k«to- 
gluUrata 



import under anaerobic conditions for cytochromes, catalase). An integrated 
view of the main components of the central metabolism of H. pylori strain 26695 is 
presented. The use of glucose as the sole carbohydrate source is emphasized. 
Urease, a multisubun'rt Ni 2 *-binding enzyme, is crucial for colonization and for 
survival of H. pylori at acid pH, and is indicated as a complex (purple circle) with 
Hpn, a Ni 2 *-binding cofactor, and a newly identified Hpn-like protein (HP1432). A 
question mark is attached to pathways that could not be completely elucidated. 
Pathways or steps for which no enzymes were identified are represented by a 
red arrow. Pathways for macromolecular biosynthesis (RNA. DNA and fatty 
acids) have been omitted. ackA, acetate kinase; acnB, aconrtase B; aspC, 
aspartate aminotransferase; did, o-lactate dehydrogenase; gdhA, glutamate 
dehydrogenase; glnA, glutamine synthetase; gltA citrate synthase; HydABC, 
hydrogenase complex; icd, isocitrate dehydrogenase; p/7, pyruvate formate 
lyase; por, pyruvate ferredoxin oxidoreductase; ppc, phosphoenolpyruvate car- 
boxylase; pps, phosphoenolpyruvate synthase; pfa, phosphate acetyltransfer- 
ase; gldD, gtycerol-3-phosphate dehydrogenase; NDH-1, NADH -ubiquinone 
oxidoreductase complex. 
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proteins, sensor-regulator pairs and other proteins may be acid- 
induced. 

Regulation of gene expression 

Bacteria regulate the transcription of their genes in response to 
many environmental stimuli, such as nutrient availability, cell 
density, pH, contact with target tissue, DNA-damaging agents, 
temperature and osmolarity. In the case of pathogens, the regulated 
expression of certain key genes is essential for successful evasion of 
host responses and colonization, adaptation to different body sites, 
and survival as the pathogen passes to new hosts. In H. pylori, global 
regulatory proteins are less abundant than in R coli. For example, 
orthologues of many DNA-binding proteins that regulate the 
expression of certain operons such as OxyR (oxidative stress), Crp 
(carbon utilization), RpoH (heat shock), and Fnr (fumarate and 
nitrate regulation) are absent. Only four H. pylori proteins have a 
perfect match to helix- turn-helix (HTH) motifs, a signature of 
transcription factors; a putative heat-shock protein (HspR), two 
proteins with no database match (HP1 124 and HP1349) and SecA, a 
component of the general secretory machinery. In contrast, 34 
proteins containing an HTH motif were found in H. influenzae 
and 148 in R coli We identified several other putative regulatory 
functions, including SpoT and CstA for 'stringent response* to 
amino-acid starvation and to carbon starvation, respectively. 

Environmental response requires sensing changes and transmission 
of this information to cellular regulatory networks. Two-component 
regulator systems, consisting of a membrane histidine kinase sensor 
protein and a cytoplasmic DNA-binding response regulator, provide a 
well studied mechanism for such signal transduction. Four sensor 
proteins and seven response regulators were found in K pylori, 
similar to the number found in H. influenzae 7 . This is approximately 
one third the number found in E. coli which, in contrast to H. pylori 
and H. influenzae, may be exposed to more environments. 

Metabolism 

Metabolic pathway analysis of the H. pylori genome suggests the 
following features. H. pylori uses glucose as the only source of 
carbohydrate and the main source for substrate-level phosphoryla- 
tion. It also derives energy from the degradation of serine, alanine, 
aspartate and proline. The glycolysis-gluconeogenesis metabolic 
axis constitutes the backbone of energy production and the start 
point of many biosynthetic pathways. The biosynthesis of peptido- 
glycan, phospholipids, aromatic amino acids, fatty acids and cofac- 
tors is derived from acetyl-CoA or from intermediates in the 
glycolytic pathway (Fig. 6). The metabolism of pyruvate reflects 
the microaerophilic character of this organism. Neither the aerobic 
pyruvate dehydrogenase (aceEF) nor the strictly anaerobic pyruvate 
formate lyase (pfl) associated with mixed-acid fermentation are 
present. The conversion of pyruvate to acetyl CoA is performed by 
the pyruvate ferrodoxin oxidoreductase (POR), a four-subunit 
enzyme thus far only described in hyperthermophilic organisms 41 . 
The tricarboxylic acid cycle (TCA) is incomplete and the glyoxylate 
shunt is absent. The analysis of degradative pathways, uptake 
systems and biosynthetic pathways for pyrimidine, purine and 
haem suggests that H. pylori uses several substrates as nitrogen 
source, including urea, ammonia, alanine, serine and glutamine. 
The assimilation of ammonia, an abundant product of urease 
activity, is achieved by the glutamine synthase enzyme and a- 
ketoglutarate is transformed into glutamate by glutamate dehydro- 
genase rather than by the glutamate synthase enzyme. 

In H. pylori, proton translocation is mediated by the NDH-1 
dehydrogenase and the different cytochromes, including the 
primitive-type cytochrome cbb3 (Table 2). Four respiratory 
electron-generating deydrogenases have been identified, glycerol- 
3-phosphate dehydrogenase (GlpD), n-lactate dehydrogenase, 
NADH-ubiquinone oxidoreductase complex (NDH-1), and a 
hydrogenase complex (HydABC). Our analysis also suggests that 



H. pylori is not able to use nitrate, nitrite, dimethylsulM, 1 
tnmethylamine N-oxide or thiosulphate as electron P ^ 
Much of our metabolic analysis is supported bv evn'^ 
evidence 1 CA P er »meu 

Evolutionary relationships of//, pylori 

H. pylori is currently classified in the Proteobacteria a iaro a- 
division of Gram-negative bacteria which includes tw 
completely sequenced species, H. influenzae and E coli C° ° 
taxonomic placement, based primarily on 16S rRNA s 
comparisons, one might expect the proteins of H pvla^ ^ 
closely to resemble their H. influenzae and E. coli homol °i 
rather than those in other genomes such as Synechocystis so ^ 
gemtaltum, M. pneumoniae, M. jannaschii, and Saccharom 
cerevisae. This is indeed the case for many proteins There 
however, many examples of H. pylori proteins in amino a 
biosynthesis, energy metabolism, translation and cellular pro 
that have greater sequence similarity to those found in „di 
Proteobacteria. For example, Dhsl, the initial enzyme in 1 
chorismate biosynthesis pathway is 75.5% similar to ArabidoS 
thaliana chloroplast Dhsl gene product, and has minimal seque 
similarity to the equivalent R coli AroH, AroF or AroG *t 
products. The remaining enzymes in this pathway have stro- 
sequence similarity to their E. coli counterpart. Similarly the 
pylori prephenate dehydrogenase (TyrA), which converts 1 cho; 
mate to tyrosine, and six out of 15 enzymes in the aspartate ami 
acid biosynthetic pathways, resemble those from B. subtilis- 
similar pattern can be seen in a different functional cateeo; 
Nearly all H. pylori tRNA synthetases have eubacterial homologui 
mostly with best matches to Proteobacteria species. HoweyJ 
histidyl-tRNA synthetase shows several amino-acid sequence sic 
natures in common with eukaryotic and archaeal (M jannaschh 
homologues. 

Such observations of discordant sequence similarity are oft 
interpreted as evidence of lateral gene transfer in the evolutions , 
history of an organism. It is also possible that H. pylori diverge 
early from the lineage that led to the gamma Proteobacteria, an! 
retained more ancient forms of enzymes that have been subse 
quently replaced or have diverged extensively in H. influenzae anl 
£ coli. 

Conclusion 

Our whole-genome analysis of K pylori gives new insight into il 
pathogenesis, acid tolerance, antigenic variation and microaerophlf 
lie character. The availability of the complete genome sequence vM 
allow further assessment of H. pylori genetic diversity. This is a| 
important aspect of H. pylori epidemiology as allelic polymorphism 
within several loci has already been associated with diseaS 
outcome 5,21 * 51 . The extent of molecular mimicry between H. pylq% 
and its human host, an underappreciated topic, can now be fullj 
explored 43 . The identification of many new putative virulence 
determinants should allow critical tests of their roles and thij| 
new insight into mechanisms of initial colonization, persistence 
of this bacterium during long-term carriage, and the mechanismf 
by which it promotes various gastroduodenal diseases. m 

Methods v^ffg 
H. pylori strain 26695 (ref. 44) was originally isolated from a patient in tw| 
United Kingdom with gastritis (K. Eaton, personal communication) and im 
chosen because it colonizes piglets and elicits immune and inflammatom 
responses. It is also toxigenic, and transformable, and thus amenable$j 
mutational tests of gene function. -SI 
The H. pylori genome sequence was obtained by a whole-genome randon 
sequencing method previously applied to genomes of Haemophilus influential 
Mycoplasma gmitalium\ and Methanococcus jannaschi?. Ninety-two per cem 
of the genome was covered by at least one X clone and only 0.56% of M 
genome had single-fold coverage. 1 
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reading frames (ORFs) and predicted coding regions were identified 
tfiree methods. The predicted protein -coding regions were initially 
by searching for ORFs longer than 80 codons. Coding potential analysis 
c entire genome was performed with a version of GeneMark" trained with 
%t H. pylori ORFs longer than 600 nucleotides. Coding sequences and 
grtial starts of translation were also determined using GeneSmith (H.S., 
ublished), a program that evaluates ORF length, separation of ORFs and 
^ap and quality of ribosome binding site. ORFs with low GeneMark coding 
tial, no database match, and not retained by GeneSmith were eliminated, 
mith identified 25 ORFs that are smaller than 100 codons, had no 
'ase match and were GeneMark negative. Frameshifts were detected by 
ng pairwise alignments, families of orthologues (similar proteins 
1 from different species) and paralogues (similar proteins from within 
ne organism), and regions containing homopolymer stretches and 
ucleotide repeats. Ambiguities were resolved by an alternative sequencing 
Uf try (terminator reactions), and by sequencing PCR products obtained 
v the genomic DNA as template. Frameshifts that remain in the genome are 
dered authentic and not sequencing artefacts. 

'determine their identity, ORFs were searched against a non-redundant 
6-acid database as previously described 9 . ORFs were also analysed using 
jndden Markov models constructed for a number of conserved protein 
"ies (pfam vl.0) using hmmer 43 . In addition, all ORFs were searched 
^$t the prosite motif database using MacPattern 4 *. Families of paralogues 
constructed by pairwise searches of proteins using FASTA. Matches that 
bed at least 60% of the smaller of the protein pair were retained and 
Dy inspected. 

unix version of the program TopPred 47 was used to identify membrane- 
rig domains (MSD) in proteins. Six hundred and sixty three proteins 
ning at least one MSD were found; of these, 300 had 2 potential MSDs or 
pre. The presence of signal peptides and the probable position of the cleavage 
jjn'secreted proteins were detected using Signal -P, a neural net program that 
been trained on a curated set of secreted proteins from Gram-negative 
--a 4 *. 367 proteins were predicted to have a signal peptide. Lipoproteins 
identified by scanning for the presence of a lipobox in the first ,30 amino 
r of every protein; 20 lipoproteins were identified, eighteen of which were 
al-P positive. Outer-membrane proteins were found by searching for 
matic amino acids at the end of the proteins. 

omopolymer and dinucieotide repeats were found by using RepScan 
O.S., unpublished) which finds direct repeats of any length. All features 
"tified using these programs were validated by visual inspection to remove 
^positives. Metabolic pathways were curated by hand and by reference to 
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The genome of the bacterium Borrelia burgdorferi B31, the aetiologic agent of Lyme disease, contains a linear 
chromosome of 910,725 base pairs and at least 17 linear and circular plasmids with a combined size of more thanl| 
533,000 base pairs. The chromosome contains 853 genes encoding a basic set of proteins for DNA replication, 
transcription, translation, solute transport and energy metabolism, but, like Mycoplasma genitalium, it contains 
genes for cellular biosynthetic reactions. Because B. burgdorferi and M. genitalium are distantly related eubact 
we suggest that their limited metabolic capacities reflect convergent evolution by gene loss from more metabollcaity 
competent progenitors. Of 430 genes on 11 plasmids, most have no known biological function; 39% of plasmid genc£ 
are paralogues that form 47 gene families. The biological significance of the multiple plasmid-encoded genes Is riStffl 
clear, although they may be involved in antigenic variation or immune evasion. 




In the mid-1970s, a geographic clustering of an unusual rheumatoid 
arthritis-like condition was reported in Connecticut 1 . That cluster 
of cases focused attention on the syndrome that is now called Lyme 
disease. It was subsequently realized that a similar disorder had been 
known in Europe since the beginning of this century. Lyme disease is 
characterized by some or all of the following manifestations: an 
initial erythematous annular rash, 'flu -like symptoms, neurological 
complications, and arthritis in about 50% of untreated patients 2 . In 
the United States, the disease occurs primarily in northeastern and 
midwestern states, and in western parts of California and Oregon. 
These regions coincide with the ranges of various species of Ixodes 
ticks, the primary vector of Lyme disease. Lyme disease is now the 
most common tick-transmitted illness in the United States, and 
has been reported in many temperaTe^&fe^of , the Northern 
Hemisphere. 

It was not until the early 1980s that a new spirochaete, Borrelia 
burgdorferi* , was isolated and cultured from the midgut of Ixodes 
ticks, and subsequently from patients with Lyme disease 4,5 . Analysis 
of genetic diversity among individual Borrelia isolates has defined a 
closely related cluster containing at least 10 tick-borne species of 
Lyme disease agents, called *B. burgdorferi (sensu lato)\ B. burgdor- 
feri resembles most other spirochaetes in that it is a highly 
specialized, motile, two-membrane, spiral-shaped bacterium that 
lives primarily as an extracellular pathogen. Borrelia is fastidious 
and difficult to culture in vitro, requiring a specially enriched media 
and low oxygen tension 6 . 

One of the most striking features of B. burgdorferi is its unusual 
genome, which includes a linear chromosome approximately one 
megabase in size 7-10 and numerous linear and circular plasmids 11 " 13 , 
with some isolates containing up to 20 different plasmids. The 
plasmids have a copy number of approximately one per chromo- 
some 10,14 , and different plasmids often appear to share regions of 
homologous DNA 13,15,16 . Long-term culture of B. burgdorferi results 
in the loss of some plasmids, changes in protein expression profiles, 



and a loss in the ability of the organism to infect laboratory anim&jf j 
suggesting that the plasmids encode important proteins involvedigf 
virulence 17-19 . 



Because of its importance as a pathogen of humans and animals^ 
and the value of complete genome sequence information for uridet| 
standing its life cycle and advancing drug and vaccine development^ 
we sequenced the genome of B. burgdorferi type strain (B31), usinj? 
the random sequencing method previously described 20 " 24 . Here w£ 
summarize the results from sequencing, assembly and analysis of; 
the linear chromosome and 11 plasmids. : J| 

Chromosome analysis 

The linear chromosome of B. burgdorferi has 910,725 base pairs (bp)-' 
and an average G+C content of 28.6%. Base pair one represents th|; 
first double-stranded base pair that we observed at the left telomere/;; 
Previous genome characterizations agree with the mi 
sequence of the large chromosome 10,25 " 28 . The 853 predicted- 
coding sequences (open reading frames; ORFs) have an average;; 
size of 992 bp, similar to that observed in other prokaryotici 
genomes, with 93% of the B. burgdorferi genome representing'?! 



3 



Figure 1 Linear representations of the B. burgdorferi B31 chromosome anflsi 
plasmids. The location of predicted coding regions colour-coded by biological 



role, RNA genes, and tRNAs is indicated. Arrows represent the direction^ 
transcription for each predicted coding region. Numbers associated with tRN& 
symbols represent the number of tRNAs at a locus. Numbers associated witfr. 
GES represent the number of membrane-spanning domains according to the"; 
Goldman, Engeiman and Steitz scale as calculated by TopPred* 9 . Only proteins; 
with five or more GES are indicated. Members of paralogous gene families arej 
identified by family number. Transporter abbreviations: mat, maltose; P, gfy andT 
bet, proline, glycine, betaine; glyc, glycerol; aa. amino acid; E. glutasnate; fru; 
fructose; glu, glucose; s/p. spermidine/putrescine; pan, pantothenate; Pi, phos-;i 
phate; lac, lactate; rib, ribose; ?, unknown. 
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Amino acid biosynthesis 

Biosynthesis of cof actors, prosthetic groups, carriers 

Celt envelope 

Cellular processes 

Central intermediary metabolism 



Energy metabolism 
Fatty acid/Phospholipid metabolism 
Purines, pyrimidines, nucleosides and nucleotides 
n Regulatory functions 
Replication 



| Transport/b 
Translator 
I Transcript!* 
HI Other catej 
Illlllll Conservec* 
I I Unknown 



Table 2. Identification of B. burgdorfe ri geneT 

Bfi£ identification fSoe^ ) %Sjm 



Amino acid biosynthesis 
Serine family 
BB601 serine OHMTase (gtyA) {EcJ 

Biosynthesis of cofactors. prosthetic 
groups, and carriers 
Folic acid 

BB026 methylenetetrahydrofolate 
DHase (folD) {Bs} 

Heme and porphyrin 

BB197 protoporphyrinogen oxidase 

put (Bs} 
BB656 oxygen-independent 

coproporphyrinogen ill oxidase 

put {Bs} 

Menaquinone and ubiquinone 

BB314 octaprenyl-diP Sase (ispB) {Ec} 

Pantothenate 

BB812 pantothenate metabolism 
flavoprt (dfp) {Ec} 

Pyridoxine 

BB768 pyridoxal kinase 
(pdxK) {Sc} 

Thiamine 

BB621 4-methyl-5{b-OHethylHhia2ole 
monoP 

biosyn prt (thli) (Ec} 

Pyridine nucleotides 

BB522 NH(3Xiep NAD+ Sase {Rc} 



73 



61 



ID28-3 
BBH20 

BBH21 



outer membrane porin (oms281 
put {Bb] 

outer membrane porin (oms28) 
put {Bb} 



Id25 

BBE09 prt p23 {Bb} 
CP26 

ddIH ° Uter surface C (ospC){Bb} 

dBS07 outer surface prt, put {Bb} 



74 



93 



100 
45 



BB29 
BB293 
BB294 
BB774 
BB271 
BB272 
BB273 
BB274 
BB275 
BB276 
BB147 



CC9 



56 BBC06 exported prt A (eppA) {Bb} ioo 



57 



58 



47 



Murein saccutus and peptidoglycan 
BB160 alanine racemase (air) fEc) 
BB582 CPDase, put {Bs} 
BB200 D-alanine-D-alanine ligase 

(ddIA) (Ec) 
dd!? 0 8'utamate racemase (murl) {Ec} 
BB625 N-Acmuramoyi-L-alanine amidase 

put {E0 

BB732 penicillin-BP (pbp-3J {Ng} 
BB136 penicillin-BP (pbp-1 ) {Nm} 
BB718 penicillin-BP {pbp-2) {Hi} 
BB303 phospho-N-Acmuramoyl- 

pentapeptideTase (mraY) {Bb} 



54 
54 



BB663 

BB284 
BB283 
-BB181 
BB149 

BB182 



100 
100 
100 
68 
100 
100 
100 
100 
100 
100 

100 



100 
100 
99 



58 



65 



Cell envelope 

Membranes, lipoproteins, and porins 
RD ' io ' J basic membrane prt B (bmpB) {Bb} 100 
basic membrane prt A (bmpA) {Bb} 100 
basic membrane prt C (bmpC) {BbJlOO 
basic membrane prt D (bmpDJ {Bb}l0O 
basic membrane prt {Tp} 50 
exported prt (tpn38bj {Tp} 50 
fibronectin/fibrinogen-BP, put {Sp} 53 
inner membrane prt (Hi} 63 
lipoprotein LA7 {Bb} ^ 
membrane-associated prt 
P66 {Bb} 100 
membrane spanning prt, put {Se} 49 
outer membrane prt {Ng} 48 
outer membrane prt (tpn50) {Tp} 48 
rare lipoprt A (rlpA) {Hi} 58 
surface-located membrane 
Prt 1 (Imp!) {Mh} 4 c 
S2 pa (Bb} 65 



BB382 
BB383 
BB384 
BB385 
8B108 
BB319 
BB347 
BB442 
BB365 



BB753 
BB795 
BB167 
BB735 
BBtO 

BB158 



Id54 
BBA24 
BBA25 
B8A36 
BBA59 
BBA62 
BBA74 

BBA15 
BBA16 
BBA03 
BBA52 
BBA05 
BBA04 
BBA60 

1C2S 
BBJ09 
BBJ50 
BBJ5J 
BBJ52 



decorin BP A (dbpA) {Bb} 
decorin BP B (dbpB} {Bb} 
lipoprotein {Bb} 
lipoprotein (Bb} 
lipoprotein {Bb} 
outer membrane porin 
(oms28) {Bb} 

outer surface prt A (ospA) (Bb) 
outer surface prt B (ospB) {Bb} 
outer membrane prt {Bb} 
outer membrane prt {Bb} 

51 prt {Bb} 

52 prt {Bb} 

surface lipoprt P27 {Bb} 

outer surface prt D (ospD) {Bb} 
outer membrane prt, put {Bb} 
vtsE! pa put {Bb} 
vIsEl pa put {Bb} 



94 
100 
54 
100 
100 

100 
99 
99 
100 
100 
100 
100 
■81 



rod shape-determining 
prt (mreB-1) {Ec} 
rod shape-determining prt 
(mreC) {Bs) 

rod shape-determining pn 
(mreB-2) (Hi) 

serine-type D-Ala-D-Ala CPDase 
(dacA) {Hi} 

U DP-NAG 1-carboxy-vinyiTase 
{murA) {Ec} 

UDP-N-Acmuramate DHase 
(murB) {Bs} 

UDP-N-Acmuramate-alanine ligase 
(murC) {Hi} 

UDP-N-Acmuramoylalanine-D- 
glutamate 
ligase (murO} {Bs} 
UDP-N-Acmuramoylalanyl- 
D-glutamate-2,6- 
-diaminopimeiate ligase (murE) (Hi] 53 
UDP-N-Acmuramoylalanyl-D- 
glutamyl-2,6-diamino-pimelate- 
D-alanyl-D-alanine ligase 
(murF) {Bb} 100 

UDP-N-Acglucosamine-N- 
AcmuramyKpentapeptide) 
pyrophosphoryl-undecaprenol 
NAG Tase (murG) {Bs} 55 

Surface polysaccharides, ^polysaccharides 
and antigens 

BB744 antigen, p83/l00 {Bb} 100 
BB572 glycosyl Tase (IgtD) {Hi} 56 



BB715 
BB716 
BB719 
BB605 
BB472 
BB598 
BB817 
BB585 

BB201 

BB304 

BB767 



60 




56 


BB775 


53 


* 

*BB292 


52 




52 


BB280 


52 






BB281 


100 






BB277 




BB278 


73 


BB221 




BB290 


51 


BB772 




BB279 


61 


BB282 




BB285 


55 


. BB286 




BB287 


59 


BB180 




BB550 


55 


B8270 




BB288 


54 






Cellular 




General 


52 


BB567 



flgr basal-body rod prt (fliF) {Bb} 
flgr basal-body rod pn (flgC) {Bb} 
flgr basal-body rod prt (flgB) {Bb} 
flgr basal-body rod prt (flgG) {Sc} 
flgr biosyn prt {flhA} {Bb} 
flgr biosyn prt (f!hB) {Bb} 
flgr biosyn prt (fliR) {Bb} 
flgr biosyn prt (fliQ) (Bb) 
flgr biosyn prt (fliP) {Bb} 
flgr biosyn prt (fliZ} (Bb) 
flgr filament 41 kDa core pn 
(flaB) {Bb} 

flgr filament outer layer pn (flaA) 
(Bb) 

flgr hook assembly pn (flgDJ (Bb) 
flgr hook pn (figE) {Bb} 
flgr hook-associated pn {flgK) {Bb} 
flgr hook-associated pn 2 
(fliD) (Bb} 

flgr hook-associated pn 3 
(figl) (Bb} 

ftgr hook-basal body complex 
pn (flhO) (Bs} 

•figr hook-basal body complex 
pn (fliE) (Bb) 
flgr motor rotation pn B 
(motB) (BbJ 

flgr motor rotation pn A 
(motA) (Bb} 100 
flgr motor switch pn (fliN) {Bb} 100 
flgr motor switch pn (fliM) {Bb} 100 
flgr motor switch pn (fliG-1) {Td} 54 
ftgr motor switch pn (fliG-2) {Bb} 100 
flgr P-ring pn (flgl) {Ar} 5! 
flgr pn (fliL) {Bb} 100 
flgr pn (ftbD) {Bb} 99 
flgr pn (fibC) {Bb} 100 
flgr pn (fibB) {Bb} 1C0 
ftgr pn (fibAj {Bb} 100 
flgr pn. put (Bb) 10 o 

flgr pn (flaJ) (Vp} 57 
flgr-associated GTP-BP (flhF) {Bb} 100 
flgr-specific ATP Sase (flil) {Bb} 100 



99 



99 



56 



100 
100 



BBA64 antigen, P35 {Bb} 

BBA66 antigen, P35. put {Bb} 

BBA73 antigen, P35, put {Bb} 

IP38 

BBJ41 antigen, P35. put {Bb} 



100 

48 

52 



47 



BB669 

BB040 

BB414 

BB551 

BB570 

BB672 

BB660 
BB781 
BB578 
BB596 
BB597 



1036 

BBK53 outer membrane pn {Bb} 

BBK52 prtp23{Bb} 

IP28-1 

BBFOi erpD pa put {Bb} 

BBF22 prt p23. put {Bb} 

BBF32 vis recombination cassette 
VIs3-16 {Bb} - 



100 
61 
62 
55 



91 



53 
78 

100 



BBK15 
BBK50 
BBK32 
B8K37 
BBK45 
BBK46 
BBK48 

ID28-3 
BBH32 



BBI36 antigen, P35, put (Bb} 

Io2§ 

BBE31 antigen, P35, put {Bb} 

Surface structures 

BB289 flgr assembly pn (fliH) {Bb} 



antigen, P35, put {Bb} 
immunogenic pn P37 (Bb} 
immunogenic pn P35. put {Bb} 
immunogenic pn P37, put {Bb} 
immunogenic prt P37, put {Bb} 
immunogenic prt P37, put {Bb} 
immunogenic pn P37, put {Bb} 

antigen, P35. put {Bb} 



50 
99 
97 
71 
49 
68 
49 



55 
48 
54 
100 



BB681 
BB415 

BB568 

BB312 

BB565 

BB670 



chemotaxis histidine Kase 
(cheA-l) {Bb} 
chemotaxis histidine Kas a 
(cheA-2) {Bb} 

chemotaxis prt MTase (cheR-1 } 
(RsJ 

chemotaxis pn MTase (cheR-2) 
(Rm) ; 
chemotaxis response regulator 
(cheY-l}{Tp} 

chemotaxis response regulator 
(cheY-2) {Rm} 

chemotaxis response regulator 

(cheY-3) {Bb} 

GTP-BP (era} {Ec} 

GTP-BP (obg) {Syn} 

MC pn (mcp-l) {Tm} 

MC prt (mcp-2) {Td} 

MC pn (mcp~3} {Td} 

MC pn (mcp-4) {Ec} 

MC pn (mcp-5) {Ec} 

pn-glutamate methylesterase 

(cheB-1) {Sc} 

pn-glutamate methylesterase 
(cheB-2J {Sc} 

purine-B chemotaxis pn (cheW-1) 
{Bb} 

purine-B chemotaxis pn 
(cheW-2) (Ec} 

purine-Bchemotaxis pn (cheW-3) 
{Bb} 



99 

100 

61 

57 

74 

70 

98 
62 
63 
52 
57 
52 
61 
52 

63 



58 



100 



Cell division 

BB058 cell division control pn 27, put {Mj} 46 
cell division control pn 27, put {Mi*} 54 

rOlI rlit/i(>inp. inklkU ,r-. . ^ 



BB195 
8B361 
BB299 
BB789 
BB076 
BB257 
BB300 
BB302 
BB301 
BB313 
BB434 



, vv , yj, K t/> jjyj 

cell division inhibitor, put {Bs} 

cell division pn (ftsZ) (Bs} 72 

cell division pn (ftsH) (BsJ 72 

cell division pa put {Ec} 71 

cell division pn, put {Ec} 71 

cell division pn (ftsAJ {Bb} 100 

cell division pn (ftsW) {Bb} 100 

cell division pn (divlBJ {Bb} 100 

cell division pn (ftsJ) (Mj} 59 
stage 0 spoliation pn J (spoOJ} 

{Bs} 33 



anmr 

68 



*12 < 



I: 



790 



■1 kb 

andini 
i 

on 

gories 
J hypo 



l p28-2 

BBG08 stage 0 sporulation prt J (spoOJ) 

{Bb} 66 

Cell kitting 

BB143 -hemolysin (hlyA) {Ah) 62 

BB117 hemolysin III (yplQ) {BsJ 61 

BB506 hemolysin (tlyA) (Shj 59 

BB059 hemolysin (tlyC) (Sh) 65 

BB202 hemolysin, pat (Syn) 54 

Chaperones 

BB741 chaperonin (groES) (Pg) 77 

BB602 chaperonin, put {Cbj 72 

BB519 grpE prt (grpE) {Bb} 100 

BB295 heat shock prt (hsIU) {Bb} 100 

BB296 heat shock prt (hs!V) {Bb} 100 

BB649 heat shock prt (groEL) {Bb} 100 

BB517 . heat shock prt (dnaJ-1) {Bb} 100 

BB655 heat shock prt (dnaJ-2) {Ca} 59 

BB264 heat shock prt 70 {dnaK-1) {Bb} 61 

B8518 heat shock prt 70 (dnaK-2) {Bb} 100 

BB560 heat shock prt 90 (htpG) {Bb} 100 

Detoxification 

BB153 superoxide dismutase 

{sodA} {Hi} 68 
BB690 neutrophil activating prt (napA) {Hi} 57 
BB179 thiophene and furan oxidation 

prt {thdF) {Bb} 100 

Protein and peptide secretion 

BB154 preprt translocase sub (secA) {Bb} 100 

BB395 preprt translocase sub (secE) {Bl} 62 

BB498 preprt translocase sub (secY) {Sc} 64 
BB362 prolipoprt diacylglyceryi Tase 

(igt) {Ec} 56 
BB652 pa-export membrane prt 

(secD) {Ec} 63 
BB653 prt-export membrane prt 

(secF) {Hi} 63 
BB030 signal peptidase I 

(lepB-1) {Bs} 51 

BB031 signal peptidase I (lepB-2} {Syn} 57 

BB263 signal peptidase I (lepB-3) {St} 57 

BB469 signal peptidase II (Isp) {Sc} 60 
BB694 signal recognition particle 

prt (ffh) {Bs} 70 

BB610 trigger factor (tig) {Hi} 50 



Transformation 

BB591 competence locus E, put {Bs} 
BB798 competence prt E put {Hi} 

Central intermediary metabolism 
General 

BB241 glycerol kinase (gfpK) {Ec} 
BB243 glycerol-3-P DHase. anaerobic 

(glpA) {Hi} 
BB376 SAM Sase (metK) {Bs} 

Amino sugars 

BB152 glucosamine-6-P isomerase 

(nagB) {Hi} 
BB151 N-Acglucosamine-6-P deAcasec * 

(nagA} {Hi} 

Degradation of polysaccharides 
BB620 -gfucosidase, put {Syn) 
B8002 -N-Achexosaminidase, put {As} 

Phosphorus compounds 
BB533 phnP prt (phnP) {Ec} 



Polysaccharides - (cytoplasmic) 
B3166 4- -glucanoTase (malQ) {Syn} 55 
BB004 phosphoglucomutase {femD) {Mj} 52 
BB835 phosphomannomutase {cpsG} {Hi} 57 



Energy metabolism 
Aerobic 

BB728 NADH oxidase, water-forming 
(nox) {Sh} 



ATP-proton motive force interconversion 

BB094 V-type ATPase, sub A (atpA) {Mb} 64 

BB093 V-type ATPase. sub B (atpB) {Mb} 62 

B8092 V-type ATPase, sub D (atpD) {Mj} 51 

BB096 V-type ATPase, sub E (atpE) {Mj} 54 

BB091 V-type ATPase. sub I (atpl) {Eh} 53 

BB090 V-type ATPase, sub K (atpK) {Mj} 54 



BB575 CTP Sase (pyrG) (Mj) 



Electron transport 

BB061 thioredoxin (trxA) {Ec} 

BB515 thioredoxin RDase (trxB) {Bb} 

Fermentation 

BB622 acetate kinase {ackA) {Ec} 
BB589 P AcTase (pta) {Tt} 

Glycolysis 

BB337 enolase (eno) {Bs} 
BB445 fructose-bisP aldolase (fba) {Ec} 
BB730 glucose-6-P isomerase (pgi) {Pfj 
BB057 glyceraldehyde 3-P DHase 
(gap) {Bb} 

BB630 1-phosphofructoKase (fruK) {Hi} 
BB056 phosphoglycerate Kase (pgk) {Bb} 
BB658 phosphoglycerate mutase (gpmA) 
{Ec} 

BB348 pyruvate Kase (pyk) {Bs} 
BB727 pyroP-fructose 6-P 1-PPTase 
(pfk) {Eh} 

BB020 pyroP-fructose 6-P 1-PPTase, 

sub (pfpB) {Bb} 
BB055 trioseP isomerase {Bb} 

Pentose phosphate pathway 
BB222 glucose-6-P 1-DHase, put {As} 
BB636 glucose-6-P 1-DHase (zwf) {Hi} 
BB561 phosphogluconate DHase 
(gnd) {Sd} 

BB657 ribose 5-P isomerase (rpi} {Mj} 

Sugars 
BB407 

BB444 
BB676 
BB207 

BB545 



59 
99 



63 
65 



mannose-6-P isomerase 
{manA) {Ec} 

nucleotide sugar epimerase {Vc} 
phosphoglycolate PPase (gph) {Hi 
UTPrglucose-1-P uridylylTase 
(gtaB) {Bs} 

xylulokinase (xylB) {Bs} 



Fatty acid and phospholipid metabolism 





General 






54 


BB037 


i-acyl-sn-glycerol-3-P AcTase 




52 




(pIsC) {Bb} 


100 




BB685 


3-OH-3-rnethylglutaryl-CoA 








RDase {mvaA} {Pm} 


52 




BB683 


3-OH-3-methytglutaryl-CoA 




74 




Sase {At} 


53 




BB109 


Ac-CoA C-AcTase (fadA) {Hi} 


67 


52 


BB704 


acyl carrier prt {Syn} 


65 


72 


BB721 


CDP-diacyIglycerol-glycerol-3-P 








3-phosphatidylTase {Bs} 


55 




BB327 


glycerol-3-P O-acylTase, put {So} 


50 




BB368 


glycerol-3-P DHase, NAD(P)+ (gpsA) 


79 




{Bs} 


54 




,BB137 


long-chain-fatty-acid CoA ligase 




54 * 


{Syn} 


54 




BB593 


tong-cha in-fatty-acid CoA ligase 








{Syn} 


56 


58 


BB688 


melvalonate Kase {Mj} 


51 


54 


BB686 


mevalonate pyroP DCase {Sc} 


52 




BB119 


phosphatidate cytidylylTase (cdsA), 








AFS{Ec} 


61 


48 


BB249 


phosphatidylTase {Hp} 


52 




BB687 


phosphomevalonate Kase, put {Sc} 


53 



Amino acids and amines 
BB841 arginine deiminase {arcA) {Cp} 
BB842 ornithine carbamoylTase 
(arcB) {Ng} 

Anaerobic 

BB016 glpE prt (glpE) {Hi] 
BB087 Uactate DHase (Idh) {Bs} 



59 



75 



74 



53 
72 



Purines, pyrimidines, nucleosides, nucleotides 
Nucleotide and nucleoside interconversion 

BB417 adenylate kinase (adk) {Bs} 64 

BB128 cytidylate kinase {cmk-1) {Bs} 58 

BB819 cytidylate kinase (cmk-2) {Mj} 57 

BB463 nucleoside-diP kinase (ndk) {Bs} 70 

BB793 thymidylate kinase (tmk) {Mj} 59 

BB571 uridylate kinase (smbA) {Mj} 54 

Purine ribonucleotide biosynthesis 
BB544 phosphoribosyl pyroP Sase 

(prs) {Mp) 59 

cp26 

BBB18 GMP Sase (guaA) {Bb} 100 

BB817 IMP DHase (guaB) {Bb} 100 

Pyrimidine ribonucleotide biosynthesis 



Salvage 
BB777 

8B618 
BB239 

BB375 
BB588 
BB791 
BB015 

Id36 
BBK17 



of nucleosides and nucleotides 
adenine phosphoribosylTase 
(apt) (Ta} 

cytidine deaminase (cdd) (Mp) 
deoxyguanosine/deoxyadenosine 
kinase(i) sub 2 (dck) {La} 
pfs'prt (pfs-1) (Ec) 
pfs prt (pfs-2) {Hi} 
thymidine kinase (tdk) {Bs} 
uridine kinase (udk) {Bb} 



adenine deaminase (adeC) {Bs} 
Regulatory functions 



carbon storage regulator 
(csrA) {Hi} 

ferric uptake regulation prt 
(fur) (Sp) 

guanosine-3'.5'-bis(diP) 3'- 
pyrophosphohydrolase (spoT) (Ec) 
htstidine phosphoKase/PPase, 
put (Ml) 

methanol DHase regulator 
(moxR) {Bb} 

pheromone shutdown prt 
(traB) (EH 

P transport system regulatory 
prt (phoU) {Pa} 

prt Kase C1 inhibitor (pkcl) {Bb} 
response regulatory prt 
(rrp-1)(Syn) 

response regulatory prt 
(rrp-2) {Ec} 

sensory transduction htstidine 
Kase, put (Bs) 
sensory transduction 
histldine Kase, put {Syn} 
xylose operon regulatory prt 
(xylR-1) {Th.J 

xylose operon regulatory 
prt (xylR-2) {Syn} 



chpAl prt, put {Ec} 

Replication 
Degradation of DNA 
BB411 endonuclease precursor 
(nucA) {As} 



63 
61 

59 
64 
59 
47 
100 



57 



79 


General 


80 


BB184 


62 






BB647 


99 




52 


BB198 


99 






BB737 


79 




62 


BB176 


65 


BB416 


100 


BB042 


100 






BB379 




BB419 


48 




64 


BB763 


71 


BB764 


61 






BB420 




BB693 


54 




69 


BB831 


50 




63 


1p54 


43 


BBA07 



63 



48 



49 



99 



57 
100 



57 



67 



60 



48 



55 



53 



DNA replication, restriction, modification, 
recombination, and repair 
BB422 3-methyladenine DNA 

glycosylase (mag) (At} 
BB827 ATP-dep helicase (hrpA) (Ec) 
BB437 chromosomal replication 

init prt (dnaA) {Bb} 
BB435 DNA gyrase. sub A (gyrA) {Bs} 
BB436 DNA gyrase. sub B (gyrB) {Bb} 
BB344 DNA helicase (uvrD) {Ec) 
BB552 DNA ligase (lig) {Ta} 
BB211 DNA mismatch repair prt 

(mutL) {Hi} 
BB797 DNA mismatch repair prt 

(mutS) {Hi} 
BB098 DNA mismatch repair prt, 

put {Syn} 

BB548 DNA polymerase I (polA) {Hi} 
BB579 DNA polymerase III, sub 

(dnaE) .{Ec} 
BB438 DNA polymerase III, sub 

(dnaN) {Bb} 
B8461 DNA polymerase HI. sub / 

(dnaX) {Bs} 
BB710 DNA primase (dnaG) (Bs) 
BB581 DNA recombinase (recG) (Syn) 
BB828 DNA topoisomerase I (topA) {Syn} 
BB035 DNA topoisomerase IV (parC) {Bb} 58 
BB036 DNA topoisomerase IV (parE) (Bb) 56 
BB745 endonuclease III (nth) (Syn) 
BB837 excinuclease ABC, sub A 

(uvrA) (Ec) 
BB836 excinuclease ABC, sub B 

(uvrB) {Ec} 
BB457 excinuclease ABC, sub C 

(uvrC) (Syn) 
BB534 exodeoxyribonuclease 

III (exoA) {Bs} 
BB632 exodeoxyribonuclease V, chain 



56 
61 

100 
67 
99 
55 
56 

55 

57 

51 
61 

62 

100 

61 
56 
60 
64 



59 



71 



57 



>: 3 

v3 



100 
100 
100 

61 



(recD)fEc) 

f^^ibonucleaseV, chain 
(recB) (Hi) 

BB634 exodeoxyribonuclease V 

RRftM chajn ( frecC) (Hi) 

BB829 exonuc ease SbcD (sbcD) {EcJ 

B8830 exonuclease SbcC (sbcC) {EcJ 

fBb}° S£H Q ' 6 diV Pft 8 < 9idB) 
BB178 glucose-inhibited div prt A 
DBM , (9icJA){BbJ 

88023 (WS^^ hto 

BB607 rep helicase, ss DNA-dep 

ATPase(rep){Hi} 
»' 1 1 repficattve DNA helicase 

(dnaB) [EcJ 
BB1 14 ss DIMA-BP (ssb) [SynJ 

BB254 ss-DNA-specific exonuclease 
(recJ) {HiJ 

88623 (m?iu^ 0n ' rePak C0UpIing factor 
BB053 uracil DNA gVcosylase(ung) {Hi} £ 

BBG32 replicative DNA helicase, put {Bsj 59 

8BE29 fH?'" 6 SPeCifi ° ° NA MTaSe * Put 

( P} 57 

Transcription 
General 

68052 spoil prt (spoUJ {EcJ 54 
Degradation of RNA 

88805 68 

Rr™ r r ^ onuc ' e ase H (rnhB) {Hi} 66 
BB705 nbonuc lease III (rnc) {Bsj 62 

(rnpAJ {Bb) 



54 


BB833 




BB251 


51 


BB659 




BB587 


51 


BB514 


55 




52 


BB513 


99 


BB402 




BB226 


100 


BB720 




BB005 


100 


BB370 




BB738 



ow.cuuyt-mNA sase (ileS) {Sc 
eucyl-tRNA Sase'fleuS) (Bsj 



(pheTMBbJ ^ * 

phenylalanyMRNA Sase su 
(pheSj {Bb} 

prolyl-tRNA Sase (proS) [Scl 



••jK^'iariyt-mi^ &ase (trsA) 
tyrosyMRNA Sase {tyrS) (Bs) 
valyl-tRNA Sase (valSj {Bs} 



66 
70 
54 



100 

100 

65 

62 

67 

65 

62 

67 



BB502 Directed RNA polymerase 
(rpoA) [Bsj 

,^V(B e b? t,RNAP ° lymeraSe 
88388 ,^ A C ft? dRNAP0VmeraSe 

88450 ( n^r asesi9ma - 54,ac,or 

Transcription factors 
BB107 n utilization substance prt B 

(nusB) {Ec} 
88800 N-utrlizetion substance pa A 

(nusA)fBs} 
BB394 transcription antitermination 
pp n p factor (nusG) {EcJ 
68132 ^"scnptfon elongation factor 

(greA) [EcJ 
BB355 transcription factor, put {MxJ 
BB230 janscnption termination fac or 
Rho (rho) {Bb} 

RNA processing 

88706 ^nucleotide adenytylTase 
(papS){Bs} 

Translation 

'3eneral 

3B590 dimethyladenosine Tase 

(*sgA) {Bs} 
3B802 ribosome-B factor A (rbfA) {Bs} 

™™*cyttRNA synthetases 

a'anyl-tRNA Sase (alaS) {Eel 
^inyt-lRNASase(argS){M, 5 < 
BJ01 asparaginyl-tRNA Sase (asnS) {Eel 73 
*M6.. aspartyJ-tRNASasefaspSHEc] ' ffi 
■» .<fteiny^tRNASa S e(<^sfHi * 

^OlycyHRNASase(glyS){Taj rp 
1835 J nistidyMRNA Sase (hlsHMj, g 



100 



64 
97 
71 

61 

100 

57 



BB608 aminoacyl-histidine dipeptidase 

(pepD)fHi) K 5 

rrarq am j no P e Ptidase I (yscl) (BbJ ioo 
BB069 ammopeptidase II {Bsj « 
BB611 ATP-depClp protease proteolytic 

component (clpP-i) {Hi) 7Q 
BB757 ATP^epC.pAeiseproteo.y.ic 

component (clpP-2) {Hi) en 

88369 ATP^epClp^roteasesubA 6 
{cfpA} [EcJ « 

BB612 ATP-depClp protease, sub X 

(clpX}(Ec} A 7 , 

BB834 ATP-depClp protease, sub C 
(dpC) {Pp} 

BB6 2 ?! ^ ep P roteas eLA(lon-1){Bb} 100 
E « h ? protease LA (lon-2) {Hi} 65 
BB359 carboxyl-terminal protease (ctp) 
{SynJ * wt 

88203 p L x a )^r bi,ity - 9overnin9 

88204 ( L h a fI ^ c f'^ 

BB248 oiigoendopeptidaseF(pe P F){LIJ 58 

88067 peptidase, put {Sc} « 
BB104 peripfasmic serine protease 

DO{htrA)[HiJ . gQ 

HR7RQ Pf"? di P e P tid ase (pepOJ [HiJ 49 

BB769 sialogycoprotease(gc P J{HiJ ' £ 
BB627 vacuolar X-protyl dipeptidyl 

aminopeptidase 

I (pepX) {Ml 55 

BB118 zinc protease, put {Hi} 54 
0b536 zinc protease, put {Hi} 52 

Nucfeoproteins 
BB232 hbbU prt {BbJ 



100 



68 



62 

62 

64 

56 
47 

100 



57 



62 
55 



Protein modification 

^Peptide deformylase 
(def) [SynJ 

BB648 serine/threonine kinase, put {Pfj 5l 

Bl^^hSfT' Sy ? thSSiS and Edification 

rrI27 n ? osoma ' L1 (rpIAJ [Bsj 71 

BB4B1 nbosomal pn L2 {rplBj (Bb 99 

BB478 nbosomal prt L3 (rpIC) [Bb 99 

r £ OSOma ' P^ L4 (rpID) {Bb 10 0 

RRdo? r 'bosomal prt L5 (rp lE ) {HiJ 8 o 

r !bosomalprtL6{rplF,(Sc 72 

BB390 nbosomal pn L7/L12 (rplLJ jSc) 75 

BR^i r ^ 0S0ma ! P ^9(rpll,fEcJ ' 57 

BR^i r ^ osoma ' P^ L10 (rpu, (Bsj gt 

rr??q r £ osomal PrtLl1 (rplK)fTmJ 73 

S r 'bosomal prt L13 (rplMj [Hi) 72 

BB488 nbosomal prt L14 (rpIN) (Tm, 79 

l^itl ^^'P^LlSfrpIO) Bs} el 

BRRrn r ^ osomal P^Ll6(r P IP,(Syn} ?? 

BB503 nbosomal prt L17 (rplQJ (Ec) m 

BB69Q ri h bOSOma| P^18 rpIR Bs * 

Rfif^ r !^ Soma 'P^Ll9(rpiS} Ec 74 

BB77R n ^ Soma 'P^L20{r P rr} Ec j Q 

S r n ^ SOma ' P^ ^MrplUj Ec 58 

Ba££ n ^ 0soma ' Prt L22 (rpfV) {Bb 10 o 

K r, b osoma 'PrtL23(rpIW}(Bb, S 
BB4S9 nbosomal prt L24 

(rplX) [Ec} ^ 
BB780 nbosomal prt L27 

(rpmA) [Hi} ft? 
BB350 nbosomal prt L28 (rpmB} 

{EcJ ft 

RR/ioc r !^° S0m a' Prt L29 (rpmCJ {Bsj 65 
ITU "Sf^JPnLWtnpmD Bs 60 
68229 nbosomal prt L31 (rpmE) {Bsj 69 



RmS r !? osoma, Prtl-32{rpmF)(BsJ 6 2 

S r, ^° soma, PrtL33(rpmGj(Bs) 76 

BB440 nbosomal prt L34 (rpmH) Bb ,5 

BB189 nbosomal prt L35 (rpml) {BaJ 74 

B^27 9 ^ osom aj Prt L36 (rpmj) {Bsj 89 

BB 27 nbosoma P rtSl(r P sA}{Ec) 55 

rrIS n ^ soma ' Prt S2 (rpsB) {Paj 79 

S "^somal pa S3 (rpsC) {Hi} 7? 

BB615 nbosomal prt S4 (rpsD} {Hi} ^ 

BB495 nbosoma prt S5 (rpsEj {Bs 77 

BB^ ^Pf ma PrtS6{rpsF)(Os} " 50 

BB386 nbosomal prt S7 (rpsG) (Sc) 7^ 

BB338 ^ S0 ^P^8(rp P S H lyn} 2 
BB338 nbosomal prt S9 (rpsl} {HiJ 7* 

BB^nT "^^'PrtSlOtrpsJjfBb} 100 
BR^7 n °° Soma P^SlKrpsKJfHiJ 7? 

BB500 nbosoma prt Sl3{rpsM){C P , 76 

BRftnl r !?° SOma PrtSHfrpsNjfBs} 72 

BrS? n ^ sorna, P rt Sl5(rp S 0){Tt} 77 

S ^P^'PrtSiefrpsPJfBs 70 

BB487 nbosomal prt Sl7 {rpsQ, Mc) ■ 76 

BB113 noosomalprtSl8(r P sR)[Bs} 78 

K>^ ma Pft Sl9 (r P sS » ( Bb 99 

S '^ oma, PrtS2l(rpsU}{Mx} e¥ 
BB516 rRNA methylase (yacO) \Mc] 66 

tRNA modification 

68821 fp^f^'^N^^ntyladenosine 
tKNA modification enzyme 
(miaA)(Ec) „ 
BB084 AT{nifS}{Synf f. 
BB343 g!u-tRNA amidoTase, sub C (gatC) 

{Bs} _ fi 

88341 fesj tRNA amfdoTase - sub B (gatB) 
BB342 gJtRNA amidoTase, sub A fgatA) 63 

BB064 methionyl-tRNA formylTase 61 

(/mt}{Ec} S6 

B8787 peptidyMRNA hydrolase (pth) [BbJ 100 

P!^° D ^ 100 
68021 { S ^ :lRNA rtbosylTase.isomerase 

BB809 tRNA-guanine transglycosylase 96 
[tgi) [ZmJ eg 
BB698 tRNA {guanine-Nl)-MTase (trmD) 

BB803 tRNA pseudouridine 55 Sase 

UruB) [EcJ 5? 

Translation factors 

llfi V P ;% m l mb ™* Prt OepA) [Hi} 76 
BB196 pepnde chain release factor 1 (prfA) 

BB074 peptide chain release factor 2 (prfe/ 3 

BB121 ribosome releasing factor (frr) (Mt} 68 
68169 jranslaticn initiation factor 1 (infA) 

BB801 translation initiation factor 2 (infB) ^ 

73 

BB190 translation initiation factor 3 (infC) 

dd {PvJ 7? 
BB691 translation elongation factor G 

(fus-2J {Tm} „ 
BB214 translation elongation factor P 

(efp) {EcJ 56 
BB476 translation elongation factor 

TU (tufj [Bo} 10Q 
BB122 translation elongation factor 

TS (tsf) [Hi} 5? 
BB540 translation elongation factor 

G(fus-1)fTT} gg 

Transport and binding proteins 

General 

BB573 ABC transporter, ATP-BP (Bsl w 

A8 Ctrans P oaer, ATP-BP SynJ 57 

Irt^ A6 ^ a ^P0.er, ATP-BP H f, 74 

eB754 ABC transporter, ATP-BP fBl) m 

88269 ATP-BP (ylxH-1) (BbJ ^ 
BB726 ATP-BP (ylxK-2) (Bb} ^ 

Ip38 

BBJ26 ABC transporter, ATP-BP {Mj} 62 

Amino acids, peptide, and amines 

oJj/29 glutamate trans P oner (gitp) fBsl *z 

BB401 glutamate transporter, put Bs} « 

BB146 GBP ABC transported ATP-BP 53 



(proV) [Sc] 
B8145 GBP ABC transporter, permease 

pn (proW) {Ec} 
BB144 GBP ABC transporter, BP 

(proX) (Ec) 
BB334 OP ABC transporter, 

ATP-BP (oppD) {Bs} 
BB335 OP ABC transporter, ATP-BP 

(oppF) (Bs) 
BB332 OP ABC transporter, permease prt 

(oppB-1){Ec} 
BB747 OP ABC transporter, permease prt 

(oppB-2){Bs) 
BB333 OP ABC transporter, permease 

pn (oppC-1){Hi} 
BB746 OP ABC transporter, permease 

prt (oppC-2){Bs) 
BB328 OP ABC transporter, 

periplasmic BP 

(oppA-1) {Bb} 
BB329 OP ABC transporter, 

periplasmic BP (oppA-2) {Bb} 
BB330 OP ABC transporter, 

periplasmic BP (oppA-3) {Bb} 
BB642 SP ABC transporter, ATP-BP 

(potA) {Ec} 

BB641 SP ABC transporter, permease prt 
(potB) {Ec} 

BB640 SP ABC transporter, permease prt 

(potC) {Ec} 
BB639 SP ABC transporter, periplasmic 
BP {potD) {Ec} 



71 




{Mg} 


56 


BB586 




BB557 


phosphocarrier prt HPr 




88141 


66 




(ptsH-2) (Hi} 


69 


IP28-4 




BB558 


phosphoenolpyruvate-prt PPase 




BBI26 


43 




(ptsl) {Sc) 


65 






BB408 


PTS system, fru-specific IIABC 




Ip25 


75 




(fruA-1){EcJ 


65 


BBE22 




BB629 


PTS system, fru-specific IIABC 






80 




(fruA-2) {Ec} 


68 






BB559 


PTS system, glu-specific IIA 




Transp 


68 




(err) {Bb} 


100 






BB645 


PTS system, glu-specific IIBC 




Ip38 


54 




(ptsG) {Sc} 


67 


BBJ05 




BB116 


PTS system, mal/glu-specific 






64 




IIABC (malX) {Ec} 


" 56 


lo36 




BB677 


RG ABC transporter, ATP-BP 




BBK25 


52 




(mgIA) {Mg} 


68 






BB678 


RG ABC transporter, 




IP28-1 






permease prt (rbsC-1) {Mg} 


51 


BBP18 


74 


BB679 


RG ABC transporter, 




BBF19 






permease prt (rbsC-2) {Mp} 


52 





94 



69 



65 



63 



53 



co26 






BBB04 


PTS system, cello-specific 






IIC (celB) {Bs} 


62 


BBB05 


PTS system, cello-specific 






IIA (ceIC) {Bs} 


61 


BBB06 


PTS system, cello-specific 






IIB (celA) {Bs} 


73 


BBB29 


PTS system, glu-specific 






IIBC, put {Ec} 


70 



(pncA) {Mt} 



transposase-like pa put {Bb} 



BBK25 transposase-like pa put {Bb} 



transposase-like pa put {Bb} 
transposase-like pa put {Bb} 

IP28-2 

BBG05 transposase-like prt {Bb} 



55 * 



56 



IP28-3 
BBH40 



transposase-like pa put {Bb] 



Id17 

BBD20 transposase-like pa put {Bb} 
BBD23 transposase-like pa put {Bb} 

Unknown 



80 



96 
96 



99 



57 



99 









Cations 






BB528 


aldose RDase. put {Bs} 


57 


IP54 






BB724 


K+ transport prt (ntpj) (Eh) 


60 


BB684 


carotenoid biosyn pa put {Ss} 


58 


BBA34 


OP ABC transporter, 




BB380 


Mg2+ transport prt (mgtE) {Bb} 


100 


BB671 


chemotaxis operon prt (cheX) (Bb} 99 




periplasmic BP (oppA-4) {Be} 


66 


BB164 


Na+/Ca+ exchange pa put (MjJ 


59 


BB250 


dedA prt (dedA) {Ec} 


54 








BB447 


Na+/H+ antiporter (napA) {Eh} 


57 


BB168 


dnaK suppressor, put {Ec} 


53 


CD26 






BB637 


Na+/H+ antiporter (nhaC-1) (Bf) 


48 


BB508 


GTP-BP {Tp} 


59 


BBB16 


OP ABC transporter, 




BB638 


Na+7H+ antiporter (nhaC-2) {Hi} 


50 


BB219 


gufA prt {Mx} 


54 




periplasmic BP (oppA) {Bb} 


78 


Other 






BB421 
BB524 


hydrolase {Hi} 

inositol monoPPase {Hs} 


58 
47 


Anions 






BB451 


chromate transport prt, put {Mj} 


58 


BB454 


I ipo polysaccharide 




BB218 


P ABC transporter, 












biosyn-related prt {Mj} 


49 




ATP-BP (pstB) {Pa} 


74 


Other categories 




BB702 


lipopolysaccharide 




BB216 


P ABC transporter, permease prt 




Adaptations and atypical conditions ■ 






biosyn-related prt {Hi} 


62 




(psrC) {Ec} 


58 


BB237 


acid-inducible prt (act206) (Rm) 


45 


BB045 


P115 prt {Mh} 


53 


BB217 


P ABC transporter, permease 




BB786 


general stress prt (etc) {Bs] 


51 


BB336 


P26 (Bb) 


100 




prt (pstA) {Syn} 


63 


BB785 


stage V sporulation prt G {Bm} 


74 


BB363 


periplasmic prt {Bb} 


100 


BB215 


P ABC transporter, periplasmic 




B8810 


virulence factor mviN prt 




BB033 


small prt (smpB) {Rp} 


70 




P-BP (pstS) {Syn} 


48 




(mviN) {Hi} 


51 


BB297 
BB443 


smg prt {Bb} 

spolllJ-associated prt (jag) (Bs) 


100 
56 


Carbohydrates, organic alcohols, and acids 




Colicin-related functions 










BB240 


glycerol uptake facilitator 




BB766 


colicin V production prt, put {Hi} 


52 


IP54 








(glpF) {Bs} 


57 


BB546 


outer membrane integrity prt 




BBA76 


thyl prt(thyl) (Dd) 


68 


BB604 


L-lactate permease (IctP) {Ec} 


57 




(tolA) {Hi] . 


44 








BB318 


methylgalactoside 










IP28-4 








ABC transporter, ATP-BP 




Drug and analog sensitivity 




BBI06 


pfs prt (pfs) (Ec) 


59 




(mgIA) {Hi} 


54 


BB140 


acriflavine resistance prt 










BB814 


pantothenate permease (panF) 






(acrB) {Hi) 


53 


cd9 








{Ec} 


63 


BB258 


bacitracin resistance prt 




BBC09 


rev prt (rev) (Bb) 


62 


BB448 


phosphocarrier prt HPr (ptsH-1) 






(bacA) (Ec) 


56 


BBC10 


rev prt (rev) {Bb] 


66 



sir : 

5pding sequence. Biological roles were assigned to 59% of the 853 
|)RFs using the classification scheme adapted from Riley 29 (Fig. I), 
|2% of ORFs matched hypothetical coding sequences of unknown* 
unction from other organisms, and 29% were new genes. The 
|verage relative molecular mass (M r ) of the chromosome-encoded 
troteins in B. burgdorferi is 37,529 ranging from 3,369 to 254,242, 
allies similar to those .observed in other bacteria including 
jamophilus influenzae 20 and Mycoplasma genitalium 2i . The 
iedian isoelectric point (pi) for all predicted proteins is 9.7. 
I Analysis of codon usage in B. burgdorferi reveals that all 61 triplet 
|#ons are used. When both AU- and GC-containing codons 
jecify a single amino acid, there is a marked bias (from 2-fold to 
i lore than 20-fold, depending on the amino acid) in the use of AU- 
i Sicodons. The most frequently used codons are AAA (Lys, 8.1%), 
t AU (Asn, 5.9%), AUU (He, 5.9%), UUU (Phe, 5.7%), GAA (Glu, 
! 1;0%)> GAU (Asp, 4.2%) and UUA^Leu, 4.2%). The most common 
feino acids are He (10.6%), Leu (10.3%), Lys (10.2%), Ser (7.8%) 
md Asn (7.2%). The high value for Lys is in agreement with the 
median calculated isoelectric point of 9.7. 
fc. 

Plasmid analysis 

dialysis of the nucleotide sequence and Southern analyses on B. 
fyrgdorferi DNA indicate that, in addition to the large linear 
iromosome, isolate B31 contains linear plasmids of the following 
ipproximate sizes: 56 kilobase pairs (kbp) (lp56), 54 kbp (lp54), 
our plasmids of 28 kbp (lp28-l, Ip28-2, Ip28-3 and Ip28-4), 38 kbp 
Ip38), 36 kbp (Ip36), 25 kbp (lp25) and 17 kbp (lpl7); and circular 
>Jasmids of the following sizes: 9 kbp (cp9), 26 kbp (cp26) and five 
)!■ six homologous plasmids of 32 kbp (c P 32). These include all of 
|e plasmids previously identified in this strain, but comparisons 
idth other B3 1 cultures suggest that this isolate may have lost one 2 1 
$p linear and one or two 32 kbp circular plasmids during growth in 
plture since its original isolation n - 14 - 19 - 30 . The sequences of all 
>Jasmids were assembled as part of this project. However, the 
ssembled sequences of the cp32 and related lp56 plasmids could 
:Ot be determined with a high degree of confidence because of DNA 
equence similarity among them (>99% in several regions of 



aj>te 2 Gene identification numbers are listed with the prefix BB as in Fig 2 Each gene 
antrfied is listed in its functional role category (adapted from Riley 29 ). The percentage of 
.Ttflanty and a two-letter abbreviation for genus and species for the best match are also 
jpvwt An expanded version of this table with additional information is available on ttie 
orW-Wide Web at http://www.tigr.org/tdb/mdb/bbdb/bbdb.htm. Abbreviations of gene 
tmesare: Ac, acetyl; BR binding protein; biosyn, biosynthesis; cello, cellobtose* CPDase 
'fboxypeptidase; Dcase, decarboxylase; DHase, dehydrogenase; flgr, flagellar/flagellum : 
Uructose; GBP, glycine, betaine, L-proline; gfu, glucose; Kase, kinase; mal, maltose" MC- 
9tnyl-accepting chemotaxis; MTase, methyltransf erase; NAG, W-acetylglucosamine'- OH 
■dnajy; OP, oligopeptide; P. phosphate; PPTase. phosphotransferase; PPase phospha- 
^.prt, protein; put, putative; RDase, reductase; RG, ribose/galactose; SAM.S-adenosyl- 
OTiionine; Sase, synthetase/synthase; SP. spermidtne/putrescine; ss, single-stranded" 
% subunrc Tase, transferase. 

Ration of genus and species are: Ah. Aeromonas hydrophiia; Ar, Agrobacten'um 
woacter, AlAIteromonas sp.; AJb,Anabaena sp.; An.Anacystis nidulans; At, arabidopsis 
Wjna; Av, Azotobacter vinelandii; Bf, Bacillus firmus; Bl, Cacillus licheniformis; Bm 
Wfa imegatenum; Bs, Bacillus stearothermophilus; Bs. Bacillus subtilis; Bb, Borrelia 
WbrtM', Be, Borrelia coriaceae, Bh, Borrelia hermsii: Ba, Buchnera aphidicota- Ca 
^tndium acetobutylicum; CI. Clostridium longisporum; Cp, Clostridium perfringens' Co 
^pegactenumglutamicum: Cb, Coxiella burnetii; Cp, Cyanophora paradoxa; 'Si 
^sfeAum dtscoideum; He, Escherichia coli; Eh, Entamoeba histolytica; Ec' 
renpoecter cloacae; El, Enterococcus faecalis; Eh, Enterococcus hirae; Ha' 
M^lusaegyptius; Hi, Haemophilus influenzae; Hp, Helicobacter pylon; Hs, Homo 
?g?s; La, Lactobacillus acidophilus; UXactococcus lactis; Li. Leptospira interrogans 
2$f£^ Mj, Methanococcus Jannaschii; Mb, Methanosarcina barken' Ml 
$W$t&fum leprae; Mt, Mycobacterium tuberculosis; Mc, Mycoplasma capricolum- 
^Mycoplasma genitaltum; Mh, Mycoplasma hominis; Mh, Mycoplasma hyorhinis' Mm' 
PP/Jtesma mycotdes; Mp, Mycoplasma pneumoniae; Mx, Myxococcus xanthus- No' 
^^^goriorrhqeae; Nm, Neisseria meningitidis; Os. Odontefla sinensis'- Pl 
Return tetraurelta; Pa, Pediococcus acidilactici; Pf, Plasmodium falciparum'- Pg 
W^ohas gingivalis; Pv, Proteus vulgaris; Pa. Pseudomonas aeruginosa; Pm" 
W&ftpnx . mevahnti; Pp, Pseudomonas putida; Rm, Rhizobium meliloti; Rc' 
^^^^. ca P sulatus > Rs. Rhodobacter sphaeroides; Rp, Rickettsia prowazekii; Sc' 
vnam^mc^cerevfsiae; Sc, Salmonella choferaeslus; St, Salmonella typhimurium' Sh' 
^^odysenteriae; Sd. Shigella dysenteriae; So. Spinacia oleracea; Sc. 
camosus; Se, Staphylococcus epidermidis; Sp, Streptococcus pyogen- 
yo^cc^fco/or; Ss, Suffolobus solfataricus; Syn. Synechococcus sp.; Sp. 
-^SS 3 ' 1 Ihermoanaerobacterium thermosaccharofyticum; Tb, Ther- 
c^enum RT8.B4.;Ttv, Thermoproteus tenax virus; Tm, Thermotoga maritima; Tat 
^nophilus; Ta. Thermus aquaticus; Td, Treponema denticola; Tp. 
fcJSOTLl a .* Tntfcum oestivum; Tb. Trypanosoma brucei mitochondrion; Vc. 
itt& :v P*WDno parahaemotyticus; Zm, Zymomonas mobilis. 

l«^^|jlipECEMBER 1997 
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3,000-5,000 bp per plasmid)' 3 ' 16 (Table 1). Improved assembly | 
strategies are being tested to achieve closure on these plasmids 

(G. Sutton. unDublished}. Plasmid Inl 7 ic iHpnfiral t rt tW «f l«i c n I 



(G. Sutton, unpublished). Plasmid lpl7 is identical to that of lpl6.9 
from Barbour et al 15 . 

The 1 1 plasmids we have described contain a total of 430 putative 
ORFs with an average size of 507 bp; plasmid G+C content ranges 
from 23.1% to 32.3%. Only 71% of plasmid DNA represents 
predicted coding sequences, a value significantly lower than that 
on the chromosome. This indicates that average intergenic distances 
are greater in the plasmids than in the chromosome, and that many 
potential ORFs contain authentic frameshifts or stops (see E29, for 
example), suggesting that they are decaying genes not encoding 
functional proteins. Of the 430 plasmid ORFs, only 70 (16%) could 
be identified and these include membrane proteins such as OspA-D, 
decorin-binding proteins, the VlsE lipoprotein recombination cas- 
sette, and the purine ribonucleotide biosynthetic enzymes GuaA 
and GuaB. We found that 100 ORFs (23%) match other hypothe- 
tical proteins from plasmids in this and related strains of B. 
burgdorferi 15 ' 16 ** 1 ; lOjOgFs (2.3%) match hypothetical proteins 
from species other thanBv&elia; and 250 ORFs (58%) have no 
database match. 

We found that 47 paralogous gene families containing from 2 to 
12 members account for 39% (169 ORFs) of the plasmid-encoded 
genes with no known biological role (Fig. 1). Paralogue families 32 
and 50, typified by previously identified B. burgdorferi plasmid 
genes cp32 orfC and cp8.3 orf2, respectively, have some similarities 
to proteins involved in replication, segregation and control of copy 
number in other bacterial systems 16 * 31 . Previous studies have 
reported examples of plasmid gene duplication, but the extent of 



Table 1 Genome features in Borrelia burgdorferi 

Chromosome 

Coding sequences (93%) 
RNAs(0.7%). 
Intergenic sequence (6.3%) 
853 coding sequences 

500 (59%) with identified database match 
104 (12%) match hypothetical proteins 
249 (29%) with no database match 



910.725 bp (28.6% G+C) 



Plasmids 
cp9 
cp26 
Ip17 
Ip25 
Ip28-1 
lp28-2 
lp28-3 
lp28-4 
Ip36 
Ip38 
Ip54 

Coding sequences (71%) 
Intergenic sequence (29%) 
430 coding sequences 

70 (16%) with identified database match 
110 (26%) match hypothetical proteins 
250 (58%) with no database match 



9.386 bp (23.6% GC) 
26.497 bp (26.3% GC) 
16,828 bp (23.1% GC) 
24,182 bp (23.3% GC) 
26.926 bp (32.3% GC) 
29,771 bp (31.5% GC) 
28,605 bp (25.1% GC) 
27.329 bp (24.4% GC) 
36,834 bp (26.8% GC) 
38.853 bp (26.1% GC) 
53,590 bp (28.1% GC) 



Ribosomal RNA 
16S 
23S 
5S 
23S 
5S 



Chromosome coordinates 
444581-446118 
438590-441508 
438446-438557 
435334-438267 
435201-435312 

46973-47335 
750816-751175 



Stable RNA 
tmRNA 
mpB 

Transfer RNA 

34 species (8 clusters. 14 single genes) 

•The telomeric sequences of the nine linear plasmids assembled as part of this studywere 
not determined; estimation of the number of missing terminal nucleotides by restriction 
analysis suggests that less than 1,200 bp is missing in all cases. Comparisons with 
previously determined sequences of lp 16.9 and one terminus of Ip2&-1 indicate that 25 
60 and 1,200 bp are missing, respectively. 
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this redundancy has become even more apparent with the complete 
sequence of these 11 plasmids from isolate B31. MoreoT a 
preliminary search of 221 putative ORFs from the cp32s and b56 
Sm To^V r n 09 ? ^ £70% ^ino-ac^simila?it y P o 

SSouZi* M 1 1 J ,I f nids PreSented here < data not ^own)- 
Although plasmid-encoded genes have been implicated in infertil- 
ity and virulence'™, the biological roles of most of these genes a^e 
no known. The significance of the large number of paraWou 
plasmid-encoded genes is not understood. These proteins may 
be expressed differentially in tick and mammalian hosts, or maj 
imde go homologous recombination to generate antigenic varia 
ton in surface proteins. This hypothesis is supported by the 
JUSTS 3 PlaSmid - CnCoded P uta ^e memlane lipopro! 

IS89T.ii? PieS ° f 3 PUt3tiVe recombi n^/transposase similar to 
K891-hk e transposases were identified in the B. burgdorferi plas- 

? aSm ' d 28 ", 2 C ° mainS ° ne ^"-'ength copy of th s 
gene. Although no mverted repeats were found on either side of the 
transpose, there is a putative ribosome-binding site several 
nucleotides upstream of the apparent start codon and a stem- 
oo P s ructure (-27 kcal mol"') ,95 bp downstream of th stop 

func^oLT arCa W,th n ° °f *■ ThiS trans P°* a * might represent a 
funcLonal gene .mportant for the frequent DNA rearrangements 
that presumably occur in Bonelia plasmids. There are othef partial 

ZZil C ° mp[ete C ° pieS °, f the '™sposase gene that cLain 
frarne-destroymg mutations elsewhere in the genome: two copies on 
Ipl7, one on Ip36. one on 1 P 38, one on l P 28-3, two on 1 P 28 1 and 
one near the nght end of the large linear chromosome 



the putative origin of replication. The GC-skew JJI 
form y negative from 0 to 450kb (minus stranT) S| 

(Fig. 2). Additional evidence for the location J * 
replication comes from our discove^ o ^ 
whose skewed distribution in the pJvers J - Si 
chromosome matches the GC skew (Fie 2)t£w^% 
nificance of this octamer has not yeTbeeLiermmedtSI 
may be analogous to the Chi site in Escherichia colZ IsIM 
m recBCD mediated recombination. No GC skel„L T P k 
any of the plasmids, although the hep^jjr^g 
skewed distribution in the plus versus the minus s^nd S 

stwn) ^ 31 aPPr0Ximate midp0int of Pidl 



Origin of replication 

Sl7n P R C l ti0 TJ ChaniSm f ° r the linear c hromosome and plas- 
mids in B. burgdorferi* notyetknown.-Replication possibly begins 
at the termmi, as has been proposed for the poxvirus hairpin 
Ulomeres-, or may begin from a single origin somewhere abng 
the length of the hnear replicon. Of the genes on the lineaf 
chromosome 660/0 are transcribed away from the centre of the 
chromosome (Fig. 1), similar to the transcriptional bias observed 
for the genomes of M. genitalium" and M. pneumoniae". It has 
been suggested that bacterial genes are optimally transcribed in the 
same direction as that in which replication forks pass over them 
particularly for highly transcribed genes"- 35 

likdtTha^tST-^ 1 ' 0 ", 31 bia r ° bSerVed " R bur sMeri, it seems 
likely that the origin of replication is near the centre of the 

chromosome. Because bacterial chromosomal replication origins 

"BB^Tf X T h iS imrigUin S l ° " 0te that thi * gene 

BB437) hes almost exactly at the centre of the linear B. burgdorfer- 

fnri r0m °u K * ^. Centra11 ^ initiate d, bi-directional replicaton 
fork would be equidistant from the two,- chromosome ends, and 
replication would traverse the rRNA genes iX&e direction as 
transcription. 

An analysis of GC skew, (G - C)/(G + C) calculated in 10-kilo- 
base (kb) windows across the chromosome, shows a clear break at 



TTGTTm 



Transcription and translation 

Genes encoding the three subunits (a, B, 8') of the core «1 
polymerase were identified in B. burgdorferi J??M 
alternat ve o factors, a* and rpoS. The role an^pSSi 0 S 
these o factors in transcription regulation in BburgZeri^ 

tT- J T A nUSB ^ rh ° * enes > whi * a^Solvei 9 
transcription elongation and termination, were also idenSedl 

r4W??° n ? f ! h k e S enome with significantly higherG + Cconj 
hf rRMA ^T 611 nUCk0tideS 434 .000 and 447,000, co/i 
bur^rf ° Per0n - AS PrCVi0USly re P° rted > the ^A o pe ' ron g 
burgdorferi contains a 16S rRNA-Ala-tRNA HmZ^I 
rRNA-SS rRNA-23S rRNA-5S rRNA- Tof tLe Sn J 
present ,n the same orientation, except for that encodhgf 
tRNA Four unrelated genes, encoding 3-memyladeninegH J 
hydrolyase and two with no database match, are also pre em h 

Son Pe as°ie^L 0f *~ *~ " * * 

We identified in the chromosome 31 tRNAs with specificity foil 
20 amino acids Fig. 1 ). These are organized into 7 cluTer S 
single genes. All tRNA synthetases are present except gluS 
tRNA-synthetase. A single glutamyl tRNA synthetase p S 
aminoacylates both tRNA and tRNA Gln with ^utamate foSoS 

bytransam.dationbyGlu-tRNAamidotransferafe.aheJeoVrSnei 
enzyme present in B. burgdorferi and several Gram-pos ve Kl 
and archaea «. The lysyl-tRNA synthetase (LysS) in B. torSoSg 
class I type that has no resemblance to any known bacterid^ 
eukaryotic LysS, but is most similar to LysS from the aSa ^ 

Replication, repair and recombination ™ 

The complement of genes in B. burgdorferi involved in DNvl 
replication is smaller than in E. coli. but Similar Z AaT £W 
genitalium''. Three ORFs have been identified with h gh hllSk 
to four of the ten polypeptides in the E. coli DNA polymeraseULarf 
J and y, and r In E. coli, the y and r protems^re^oS I £l 

tW 8 nMA m , nb ° SOmal frameshifti "g- This observatorsuteS 
that DNA replication in B. burgdorferi, like that in M. geniS^ 
accomphshed with a restricted set of genes. B. burgdtrfJiZ^f 



mwm 




Figure 2 Distribution of TTGTTTTT and GC"' 
skew in the B. burgdorferi chromosome. To'ftT 
distribution of the octamer TTGTTTTT. The' 
lines in the top panel represent the location o£ 
this octamer in the plus strand of ; ti#1 
sequence, and those in the second panei;J 
represent the location of this oligomer in thed 
minus strand of the sequence. Bottom, GC'I 
skew. 



Kilobases 



910.725 



582 



S Z7 ' eSPUe " S hnear chr °™s°mal structure. This 
wo of three DNA mismatch repair enzyme (mutS, mutL) are 



articles 

GATC {dam) methylat.on ,n strain B31 (S. Casjens, unpublished) 

dZ P JT m r 8 T S f ° r ,he repair 0f "'^aviolet-induced DNA 
damage (uvrA, uvrB, uvrC and uvrD) (Table 2) 

B. W^r/m has a complete set of genes to perform homologous 
ecomb.nat.on mduding rccA, recBCD, sbcC, sbcD, recG, ZabZ 

en J, H K° nUCl f, Se "? ivi,y 3SS0cia,ed with in £.. C Jmay D e 
encoded by exoA (exodeoxynuclease III). Although recA is pTeTent 
we found no evidence for fa*, which encodes L £e^o7t2 




^t^ m ^ M ^^^B. burgdorferi. Asc^emaiic in Table ? ^ ■ w 

cell providing an integrated view of the transporters Pre/unL , "*** ch " mosoma ' and blue indicates plasmid ORFs) 

J^r-^ ORF ^^ W rrespond«omoseli S ,ed ^T^ZTT- *** " 10 ** 8ub8ttate «*o**r 

^V:. ' ... d ' fectl0n of ca,al * s,s: «• expected activities that were not found. 
UPBCEMBER 1»7 ; 
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regulates SOS genes in E. coli. No genes encoding DNA restriction or 
modification enzymes are present. 

Blosynthetic pathways 

The small genome size of B. burgdorferi is associated with an 
apparent absence of genes for the synthesis of amino acids, fatty 
acids, enzyme cofactors, and nucleotides, similar to that observed 
with M. genitalium 21 (Fig. 3, Table 2). The lack of biosynthetic 
pathways explains why growth of B. burgdorferi in vitro requires 
serum-supplemented mammalian tissue-culture medium. This is 
also consistent with previous biochemical data indicating that 
Borrelia lack the ability to elongate long-chain fatty acids, such 
that the fatty-acid composition of Borrelia cells reflects that present 
in the growth medium 6 . 

Transport 

The linear chromosome of B. burgdorferi contains 46 ORFs and the 
plasmids contain 6 ORFs that encode transport and binding 
proteins (Fig. 3, Table 2). These gene products contribute to 16 
distinct membrane transporters for amino acids, carbohydrates, 
anions and cations. The distribution of transporters between the 
four categories of functions in this section is similar to that observed 
in other heterotrophs (such as Haemophilus influenzae, M. genita- 
lium and H. pylori) t with most being dedicated to the import of 
organic compounds. 
There are marked similarities between the transport capacity of B. 
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•AAATATAATTTAATAGTATAAAAAACTGTTT-3 ' 
TT AT ATT AAATTATC AT ATTTTTTGAC AAA - 



^TAAATATAATTTAATAGTATAAAAAAAATTAA - 3 * 2 
^ATTTATATTAAATTATCATATTTTTTTTAATT - 

I .. -'Mil h:.v Mill 

^AATATATAATCTAATAGTATAC AAAAGATTCA - 3 ' 3 
^TTATATATTAGATT ATCATATGTTTTCTAAGT - 

S ^ I HIM 

gjATATAATTTTTAATTAGTATAGAATATGTTAA-3 ' 4 
^TATATTAAAAATTAATC ATATCTTATAC AATT - 

I •"Mil I,' Hill 

CATATAATTTTTTATTAGTATAGAGTATTTTGA- 3 * 5 
TATATTAAAAAATAATCATATCTCATAAAACT - 

I ■ /•••"in N-'HIII 

CATATAATATTTATTTAGTAC AAAGTTC AATTT - 3 ' g 
TATATTATAAATAAATCATGTTTCAAGTTAAA- 

I • • Jll I ■ ■-• I Ij^J^y - 
/^ATATAATTTGATATT AGTACAAATC C&^TGC - 3 ' 7 
^T ATATT AAACTATAATCATGT TT AGGGG AAC G- 

^ : l..,\;lllll 

^TATTTATTATCTTTTAGTATATATATCTCTCG-3 1 g 
^■ATAAATAATAGAAAATCATATATATAGAGAGC - 



I 
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Figure 4 Telomere nucleotide sequences from Borrelia species. Nucleotide 
sequences are shown for known Borrelia telomeres as indicated: 1,8. burgdorferi 
Sh-2-82 chromosome left end; 2, ft burgdorferi B31 chromosome left end; 3, 
ft afzelii R-IP3 chromosome right end; 4, ft burgdorferi B31 chromosome right 
end; 5, ft burgdorferi B31 plasmid Ip17 left end; 6, ft burgdorferi B31 plasmid Ip17 
right end; 7. ft hermisii plasmids bp7E and pb21 E right ends; 8, ft burgdorferi B31 
plasmid Ip28-t right end. In each case the telomere is at the left. Question marks 
(?) indicate locations where S1 nuclease was used to open terminal hairpins 
during the sequence determinations. Stippled areas highlight regions that appear 
to have been most highly conserved among these telomeres; no strong 
sequence conservation has been found near the right of the terminal 26 bp 
among the different sequences listed, except between the chromosomal left 
ends from strains B31 and Sh-2-82 (see text). The telomeric sequences of the 
strain B31 chromosome were determined in this report; the others are from 
references 14, 28, 30. 45. 46. 



burgdorferi and M. genitalium. Both genomes haw 
number of recognizable transporters, so it is not cleafc 
can sustain diverse physiological reactions. Several of 
transporters in both genomes exhibit broad substrate'! 
exemplified by the oligopeptide ABC transporter (opp^ 
the glycine, betaine, L-proline transport system (pro yft^ 
fore, these organisms probably compensate for fteirl 
coding potential by producing proteins that can imp& 
variety of solutes. This is important because B. burgdorferi* 
to synthesize any amino acids de novo. We were unable t<F 
any transport systems for nucleosides, nucleotides, NAD/i$f 
fatty acids, although they are likely to be present. " ^ 
Glucose, fructose, maltose and disaccharides seem to be a 
by the phosphoenolpyruvate:phosphotransferase system? 
The two nonspecific components, enzyme 1 (ptsl) and Hpr r ' 
are associated in one operon with an apparently glucose? 
phosphohistidine-sugar phosphotransferase enzyme Y[fox 
Separate from this operon are four permeases (enzyme^ 
fruA in two copies (fructose), ptsG (glucose) and malXtiijc 
maltose) (Fig. 3, Table 2). The fructose-specific enzymej 
induced in the ORF with IIBC {JruA\ as has been observed 
genitalium 41 . Ribose may be imported by an ATP-bmding 
transporter (rbsAQ. The rbsAC genes are transcribed in an^r 
with a methyl-accepting chemotaxis protein that may respond; 
galactosides, suggesting that movement of the organisms tow 
sugars may be coupled to the transport process. 

Energy metabolism 

The limited metabolic capacity of B. burgdorferi is similar iol 
found in M. genitalium (Fig. 3, Table 2). Genes encoding all o| ; 
enzymes of the glycolytic pathway were identified. Analysis 1 ^ 
metabolic pathway suggests that B. burgdorferi uses glucofjf 
primary energy source, although other carbohydrates, iniu 
glycerol, glucosamine, fructose and maltose, may be tir 
glycolysis. Pyruvate produced by glycolysis is converted to la 
consistent with the microaerophilic nature of B. burgdorferi? 
eration of reducing power occurs through the oxidative bran ; 
the pentose pathway. None of the genes encoding proteins of 
tricarboxylic acid cycle or oxidative phosphorylation were i<F 
fied. The similarity in metabolic strategies of two distantly rela 
obligate parasites, M. genitalium and B. burgdorferi, suggests-' 
vergent evolutionary gene loss from more metabolically compef 
distant progenitors. -fit 
Addition of AT-acetylglucosamine (NAG) to culture mediur§ 
required for growth of B. burgdorferi 6 . NAG is incorporated intojf 
cell wall, and may also serve as an energy source. The cp26 plasr 
encodes a PTS cellobiose transporter homologue that could liki 
specificity for the structurally similar compound chitobiose (di| 
acetyl-D-glucosamine). A gene product on the chromosome ffl 
sequence similarity to chitobiase (BB2) may convert chitobiose| 
NAG. B. burgdorferi can metabolize NAG to fructose-6-phosphal| 
which then can enter the glycolytic cycle through the action off 
acetylglucosamine-6-phosphate deacetylase and glucosarnine^ 
phosphate isomerase. NAG is the primary constituent of d 
which makes up the tick cuticle 6 , and may be a source of c 
hydrate for B. burgdorferi when it is associated with its tick hi 

The parallels between B. burgdorferi and M. genitalium appL- 
extend to other aspects of their metabolism. Both organisms iacj 
respiratory electron transport chain, so ATP production mustj 
accomplished by substrate-level phosphorylation. Consequent 
membrane potential is established by the reverse reaction of" 
ViV 0 -type ATP synthase, here functioning as an ATPase to < 
protons from the cytoplasm (Fig. 3, Table 2). The ATP synU- 
genes in B. burgdorferi appear to be transcribed as part of a seVqL 
gene operon. They are not typical of those usually found If 
eubacteria, more closely resembling the eukaryotic vacuolar M 
type) andarchaeal (A-type) rT-translocating ATPases 4 \ both in s&; 
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An approximately 7.5-fold genome coverage was achieved by generating 19,078 
sequences from a small insert plasmid library with an average edited length of 
505 bases. The ends of 69 large insert lambda clones were sequenced to obtain a 
genome scaffold; 50% of the genome was covered by at least one lambda clone. 
Sequences were assembled using TIGR Assembler as described 20 " 24 , resulting in 
a total of 524 assemblies containing at least two sequences, which were clustered 
into 85 groups based on linking information from forward and reverse 
sequence reads. Ail Borrelia sequences that had been mapped were searched 
against the assemblies in an attempt to delineate which were derived from the 
various elements of the B. burgdorferi genome. Some contigs were also located 
on the existing physical map by Southern analysis. Sequence and physical gaps 
for the chromosome were closed as described 20 -". At the completion of the 
project, less than 3% of the chromosome had single-fold coverage. The linear 
chromosome of B. burgdorferi has covalently closed hairpin structures at its 
termini that are similar to those reported for linear plasmids in this organism". 
The telomeric sequences (106 and 72 bp, respectively, from the left and right 
ends) were obtained after nicking the terminal loop with Si nuclease and 
amplifying terminal sequences by ligation -mediated polymerase chain reaction 
(PCR) as described 28 . The unknown terminal sequence was determined in both 
directions on four independent plasmid clones of the amplified DNA from each 
telomere. A minimum amount of Si nuclease was used and, because of their 
sequence similarity to other Borrelia telomeres, it is likely that few, if any, 
nucleotides were lost from the B31 chromosomal telomeres in this process. 
Identification of ORFs. Coding regions (ORFs) were identified using 
compositional analysis using an interpolated Markov model based on 
variable-length oligomers 47 . ORFs of >600bp were used to train the Markov 
model, as well as B. burgdorferi ORFs from GenBank. Once trained, the model 
was applied to the complete B. burgdorferi genome sequence and identified 953 
candidate ORfs. ORFs that overlapped were visually inspected, and in some 
cases removed. Non-overlapping ORFs that were found between predicted 
coding regions and >30 amino acids in length were retained and included in 
the final annotation. All putative ORFs were searched against a non-redundant 
amino-acid database as described 20 "". ORFs were also analysed using 527 
hidden Markov models constructed for several conserved protein families 
(PFAM v2.0) using HMMER 48 . Families of paralogous genes were constructed 
by pairwise searches of proteins using FASTA. Matches that spanned at least 
60% of the smaller of the protein pair were retained and visually inspected. 
A total of 94 paralogous gene families containing 293 genes were identified 
(Fig. 1). 

Identification of membrane-spanning domains (MSDs). TopPred* 9 was 
used to identify potential MSDs in proteins. A total of 526 proteins containing 
at least one putative MSD were identified, of which 183 were predicted to have 
more than one MSD. The presence of signal peptides and the probable position 
of a cleavage site in secreted proteins were detected using Signal-P as 
described 23 ; 189 proteins were predicted to have a signal peptide. Lipoproteins 
were identified by scanning for a lipobox in the first 30 amino acids of every 
protein. A consensus sequence relaxed from tfiafuse^^r* M. pylor? 1 was 
defined for the purpose of this search based on known or putative £. burgdorferi 
lipoprotein consensus sequences. 
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describe here the complete genome sequence (1,111,523 base pairs) of the obligate Intracellular parasite 
ettsia prowazekii f the causative agent of epidemic typhus. This genome contains 834 protein-cooing genes. The 
tonal profiles of these genes show similarities to those of mitochondrial genes: no genes required for anaerobic 
lysis are found In either/?, prowazekii or mitochondrial genomes, but a complete set of genes encoding 
ponents of the tricarboxylic add cycle and the respiratory-chain complex Is found In R. prowazekii. In effect, ATP 
uctlon in Rickettsia Is the same as that In mitochondria. Many genes Involved In the biosynthesis and regulation of 
thesis of amino acids and nucleosides In free-living bacteria are absent from R. prowazekii and mitochondria. 
!£uch genes seem to have been replaced by homologues In the nuclear (host) genome. The A. prowazekii genome 
^Stains the highest proportion of non-coding DNA (24%) detected so far In a microbial genome. Such non-coding 

juences may be degraded remnants of 'neutralized' genes that await elimination from the genome. Pnyiogenetic 
{analyses Indicate that A. prowazeku is more closely related to mitochondria than is any other microbe studied so far. 



[e Rickettsia are ot-proteobacteria that multiply in eukaryotic cells 
f x K prowazekii is the agent of epidemic, louse-borne typhus in 
umans. Three features of this endocellular parasite deserve our 
[ention. First, R. prowazekii is estimated to have infected 20-30 
ion humans in the wake of the First World War and killed 
[other few million following the Second World War (ref. 1). 
buise it is the descendent of free-living organisms 2 " 4 , its 
fgenome provides insight into adaptations to the obligate intracel- 
|ulaLr lifestyle, with probable practical value. Second, phylogenetic 
alyses based on sequences of ribosomal RNA and heat-shock 
Soteins indicate that mitochondria may be derived from the ot- 
iteobacteria 5,6 . Indeed, the closest extant relatives of the ancestor 
mitochondria seem to be the Rickettsia 7 '™. That modern 
'ttsia favour an intracellular lifestyle identifies these bacteria 
the sort of organism that might have initiated the endosymbiotic 
cenario leading to modern mitochondria 11 . Finally, the genome of 
fAprowazekii is a small one, containing only 1,111,523 base pairs 
l(6p). Its phylogenetic placement and many other characteristics 
$3entify it as a descendant of bacteria with substantially larger 
Cgeriomes 2-4 . Thus Rickettsia, like mitochondria, are good examples 
|&fhighly derived genomes, the products of several types of reductive 
_ jolution. 

l^iThe genome sequence of R, prowazekii indicates that these three 
^foures may be related. For example, prokaryotic genomes evolving 
$£thin a cell dominated by a much larger, eukaryote genome and 
jnstrained by bottle-necked population dynamics will tend to lose 
^i&etic information 12,13 . Predictable sets of expendable genes will 
d to disappear from the prokaryotic genome when they are made 
Jjbdant by the activities of nuclear genes. Likewise, non-essential 
^puences and otherwise highly conserved gene clusters may be 
iterated by deleterious mutations that are fixed in clonal parasite 
^organelle populations because they cannot be eliminated by 
ion. This process is ongoing in the Rickettsia genomes, as 
>Vm by the identification of sequences that have recentiy become 
gdogenes. Also, a large fraction (—25%) of non-coding 
[uences in this genome may be gene remnants that have been 



degraded by mutation and have not yet been removed from the 
genome. Finally, transfer of genes from a mitochondrial ancestor to 
the nucleus of the host would both reduce the mitochondrial 
genome size and stabilize the symbiotic relationship. Phylogenetic 
reconstructions that identify genes in the Rickettsia genome as sister 
clades to eukaryotic homologues found in the nucleus or the 
organelle support this interpretation. Rickettsia and mitochondria 
probably share an a-proteobacterial ancestor and a similar evolutionary 
history. ■ 

General features of the genome 

The circular chromosome of R prowazekii strain Madrid E has 
1,1 1 1,523 bp and an average G+C content of 29.1% (Figs 1,2). The 
genome contains 834 complete open reading frames with an average 
length of 1,005 bp. Protein-coding genes represent 75.4% of the 
genome and 0.6% of the genome encodes, stable RNA. We have 
assigned biological roles to 62.7% of the identified genes and 
pseudogenes; 12.5% of the identified genes match hypothetical 
coding sequences of unknown function and the remaining 24.8% 
represent unusual genes with no similarities to genes in other 
organisms (Table 1). Multivariate statistical analysis has shown 
that there is no major variation in codon-usage patterns among 
genes that are expressed in different amounts, indicating that 
codon-usage patterns in & prowazekii may be dominated mainly 
by mutational forces 14 . G+C-content values at the three codon 
positions average 40.4, 31.2 and 18.6%, and these values are similar 
at different positions in the genome. We classified the open reading 
frames with significant sequence-similarity scores to gene sequences 
in the public databases into functional categories (Table 1) that 
allow comparisons with the metabolic profiles of other bacterial 
genomes 15-23 . 

Non-coding DNA. The coding content of previously sequenced 
bacterial genomes is, on average, 91%, ranging from 87% in 
Haemophilus influenzae to 94% in Aquifex aeolicum. In comparison, 
a large fraction of the R. prowazekii genome, 24%, represents non- 
coding DNA (Fig. 3). A small fraction of this corresponds to 
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Figure 1 Overall structure of the R. prowazekii genome. The putative origin of 
replication is at 0 kb. The outer scale indicates the coordinates (in base pairs). The 
positions of pseudogenes are highlighted with death's heads. The distribution of 
genes is shown on the first two rings within the scale. The location and direction 
of transcription of rRNA are shown by pink arrows and of tRNA genes by black 
arrows. The next circle in shows GC-skew values measured overall bases in the 
genome. Red and purple colours denote positive and negative signs, respec- 



tively. The window size was 10,000 nucleotides and the step size was 1 
nucleotides. The central circles shows GC-skew values calculated, for t 
positions in the codon only. GC-skew values were calculated separately forg 
located on the outer strand (green) and on the inner strand (blue). To allow e 
visual inspection, the signs of the values calculated for genes located on the in] 
strand have been reversed. i&lfi 



pseudogenes (0.9% of the genome) and less than 0.2% of the 
genome is accounted for by non-coding repeats. The remaining 
22.9% contains no open reading frames of significant length and it 
has the low G+C content (mean 23.7%) that is characteristic of 
spacer sequences in the ft prowazekii genome 14 . A region of 30 
kilobases (kb) located at position 886-916 kb contains as much as 
41.6% non-coding DNA and 11.5% pseudogenes. The non-coding 
DNA in this region has a small, but significantly higher, G-hC 
content (mean 27.3%) than non-coding DNA in other areas of the 
genome (mean 23.7%) (P < 0.001), indicating that it may corre- 
spond to inactivated genes that are being degraded by mutation 
(Fig. 3). 

Origin of replication. The origin of replication has not been 
experimentally identified in the ft prowazekii genome, but we 
identified dnaA at —750 kb. However, the genes flanking the dnaA 
gene differ from the conserved motifs found in Escherichia colt and 
Bacillus subtilis (rnpA-rpmH-dnaA-dnaN-recF-gyrB). In ft pro- 
wazekii, the genes rnpA and rpmH are located in the vicinity of 
dnaAy but in "the reverse orientation compared to the consensus 
motif, and dnaN, recF and gyrB are located elsewhere. 

The origin and end replication in microbial genomes are often 
associated with transitions in GC skew (G - C/G + G) values 24 . In 
R:\prowazekii we observe transitions in the GC skew values at 



around 0 and 500-600 kb (Fig. 1). There is a weak asymmetry! 
the distribution of genes in the two strands, such that the first half© 
the genome has a 1.6-fold higher gene density on one strand andt 
second half of the genome has a 1.6-fold higher gene density ont 
other strand. The shift in coding-strand bias correlates with the s 
in GC-skew values. As most genes are transcribed in the direction? 
replication in microbial genomes, the origin of replication 
correspond to the shift in GC-skew values at the position that ) 
have chosen as the start point for numbering. Indeed, several shbn 
sequence stretches that are characteristic of c/rtaA-binding motifs a 
found in the intergenic region of genes RP001 and RP885 at 0* 
supporting this interpretation. 
Stable RNA sequences and repeat elements. We identified 33 gen| 
encoding transfer RNA, corresponding to 32 different isoacceptofi 
tRNA species. There is a single copy of each of the rRNA genes, wii 
rrs located more than 500 kb away from the rrl-rrf gene clustq 



Rgure 2 Linear map of the R. prowazekii chromosome. The position i 
orientation of known genes are indicated by arrows. Coding regions are cofoj 
coded according to their functional roles. The positions of tRNA genes f 
indicated (inverted triangle on stalk). For additional information, see ht( 
evolution.bmc.uu.se/-siv/gnomics/Rickettsia.html. - 
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able 1. Functional classification of Rickettsia prowazekii protein-coding genes. Gene numbers correspond to those In Fig. 2. 
^rcentages represent per cent identities. 
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cett division protefri FtsA (B-Hfri, 29.5%) 

cen dMsion protet) FtsH (B-Eco. 54.0%) 

oeQ division protein FtsJ (B-Eco, 44.4%) 

ceo division protefc FtsK (B-Cfcu. 41 .5%) 

ceQ dMsion protein FtsQ (B-Hin, 17.9%) 

cell dMsion protein FtsW (B-Eco. 33.2%) 

cett dMsion proteh FtsY (B-Hn, 432%) 

cen dMsion protein FtsZ (B-Wsp, 65.3%) 

glucose inhibited dMsion protein A (B-Eco, 48.8%) 

glucose inhibited dMsion protein (B-Ppu 26.8%) 

MAF protein (B-Ssu, 38.1%) 

celt cyde proteki MesJ (B-Eco. 22.1%} 

rod shape-determining protein (B-Eco, 60.5%) 

rod shape-determining protein (B-Eco, 23.1%) 

rod shape-determining protein (B-Eco, 38.1%) 



hemolysin (B-77ry. 34^%) 
hemolysin (B-77y. 28^%) 



RP535 sodB superoxide dtsmutase QB-Lpn, 53.4%) 
RP759 tftfF thiophene and turan oxidizer {B-Hin, 34.7%) 

Protein and peptide secretion 

RP315 aprD protease secretion ATP-binding protein (B-Pae, 
40.0%) 

RP314 aprE protease secretion ATP-binding protein (8-Pae, 
32.4%) 

RP1 73 flh signal recognition particle receptor protein (B-Eco. 
49.6%) 

RP275 lepA GTP-btndhg membrane protein (B-8su, 57.0%) 
RP116 lepB signal peptidase (B-Sfy, 37.3%) 
RP575 secA preproteh transkxase SecA subunit (B-flca, 51.8%) 
RP070 secfi prefxoteh transiocase SecB subunit (B-Hin, 30.7%) 
RP586 secD protein-export membrane protein (B-Eco. 40.4%) 
RP134 secE preprotein transiocase SecE subunit {B-Bsu, 37.3%) 
RP114 secF protein-export membrane protein {B-HIn, 37.7%) 
RP079 secQ protein-export membrane protein (B-Hpy, 32-0%) 
RP639 secY preprotein transiocase SecY subunit (B-Eco, 50.0*,;) 
RP842 tig trigger factor (B-Eco, 32.0%) 

ENERGY METABOLISM 67 

ATP- proton motive force 
RP803 atpA ATP synthase 



RP023 atpB 
RP800 atpC 
RP801 atpD 
RP022 apE 
RP020 apF 



ATP synthase 
ATP synthase 
ATP synthase 
ATP synthase 
ATP synthase 



RP802 atpG ATP synthase 
RP804 apH ATP synthase 
RP021 atpX ATP - 



Ft alpha subunit (B-finj. 66.2 s /*) 
FOsubunita(B-flru.51.5%) 
Ft epsBon subunit (B-ftru. 24.5%) 
Fi beta subunit (B-flca. 77.0*i) 
F0 subunit c (E-flam mt, 43^i) 
F0 subunit b(B-fl/u. 21.1%) 
FI gamma subunit (B-flW. 38.0%) 
Fi delta chain (E-Os/cp 26.4%) 
FO subunit b' (E-flam mt 2a 6%) 

Electron tansport 

RP588 ccmE cytochrome c biogenesis protein (B-Hin, 33.2%) 
RP703 ccmF cytochrome c biogenesis protein (E-flam mt. 33.6%) 
RP405 coxA cytochrome c oxidase subunit I (E-Wpo ml. 68.9%) 
RP406 coxB cytochrome ecorjdase subunit II (E-Wpomt, 48.0%) 
RP191 eoxC cytccrVOTecorjdasesul)unitlll(E*ftWmL4^^ 
RP257 coxW cytochrome c oxidase assemly (E-Sce. 35.2%) 
RP304 coxil cytochrome c oxidase assembly (E-flam mt 4^9%) 
RP346 ctafl cytochrome c oxidase assembry tactor (8- We, 
40.6%) 

RP253 cycW cytochrome c CB-Bja, 35£%) 
RP216 cydA cytochrome oxidase d subunit I {B-Avi, 34.0%) 
RP217 cydB cytochrome oxidase subunit II (B-Eco, 30.0%) 
RP272 fbcH ubiquinoi cytochrome c oxWoreductase, 

cytochrome cl subunit {B-Bja, A7.8K) 
RP829 tdxA terredcodn (flea 57.5%) 

RP357 nooA NADH dehydrogonase \ chain A (E-Pw/ ml. 64.6%) 
RP356 nuoB NADH derrydrocjonase I chain B (E-flam ml 73.2%) 
RP355 nuoC NA0H(tehydiogeriaselcnalnC(B-Po«.42.1%) 
RP354 nuoD NADH dehycUogenase I chain D (E-flam mt 7t .4%) 
RP353 nuoE NADH dehydrogenase I chain E (E-flno, mt. 55.1%) 



RPM5 
RP797 
RP7P6 
RP7*»b 
RP79'. 
RP791 
RP792 
RP283 
RP282 
RP793 
RP537 
RP284 
RP27U 
RP271 
RP663 



nuoF NADH dehydrogenase t chain f {B-Pde, 69 
nuoG NADH dohydiogenase I chain G (E-Bta mt 49 3Tii 
nvoH NADHoerrydrco^naselchainH(E-flarnmt63 5%l 
nuoi NADH dehydrogenase I chain I (E-fla/n mt 71 4% j 
nvxJ NADH dehydrogenase I chain J (E-flam ml 42.3%» 
nuoK NADHdertydrop^naselchainK(B-P-te.614*;) ' 
nuotl NADH dehydrogenase I chain L (E-flam mt,' 45 6%t 
rxx>L2 NADH dehydrogenase I chain L (E-A/acc.27 or.) 
nusL3 NADH deriydragenase I chain L (E- Am© mt 1 7.0%) 
nuoM NADH dehydrogenase I chain M (E-flam mt 46.6%) 
nu*l\ NADH dehydrogenase I chain N (E-flam mt 34.6%) 
nuoA/2 NADH dehydrogenase I chain N (E-flam mt 23.2%) 
petA RtesfceH Iron sulphur protein (B-Bja. 58.3%) 
petB cytochrome b(B-Rru. 65.4U) 
pntAA NAD(P) transrydrogenase u subunit (B-Eco. 37.7%) 
pntAe NAOfP) transhydrogenase u subunit (B-Hn. 44.7%) 
pnfi NAO(P) transhydrogenase p subunit (B-Hh. 51.6%) 



RP074 



Fomentation 

RP110 ackA acetate kinase (B-Cte. 38.4%) 



(92 ppdK pyruvate, orthophosphate dSdnase (E-F*: 48^%) 
Phosphate 

RP589 ppa inorganic pyrophosphatase (B-fco, 59.3%) 
Pyruvate dehydrogenase 

RP261 pdM pyruvate dehyoVogenase E1 component a sub- 
unit (E-AfA 44X1%) 

RP262 poVifi pyruvate dehydrogenase E1 componenL p sub- 
unlt (E-See. 59.7%) 

RP530 pdhC dinyo^ipoarnide acetyltransterase E2 compo- 
nent (E-flno. 45,1%) 

RP460 pdhD (fln/droCpoejBbe (Jenydrogenase E3 component 

RP805 pdhD diny<Jro(ipoamide dehydrogenase E3 cc*rxponem" 
{Zyrn, 61.1%) 

TCA cyde 

RP799 acnA aconrtale hydratase (B-ipn, 59.1%) 
RP665 kjmC tumarate hydratase Q3-Ror, 63.5%) 
RP844 gftA citrate synthase (B-flfy. 97^%) 
RP265 fed isccnYate (lenydrogenase (B-Trft. 3a6%) 
RP376 mdh maiate cehydrogenase (B-Can, 51 .5%) 
RP128 so7iA succinate dehydrogenase, fiavoprote*! subunit 
(B-fi^a, 70.0%) 

RP044 sdhB sucdnale o^riydrogenase. Iron-sulphur protein (E- 

flam mt 69.0%) 
RP126 sdhC succinate dehydrogenase, cytochrome becg 

subunR (E-flam mt 39.5%) 
RP127 sdhD succinate clehydrogenase. subunit IV (E-flam 

mt 25.6%) 

RP180 sucA 2-oxog(uta/ale dehydrogenase (B-H2a 44.3%) 
RP179 sucS dlhyo^iDoan^^xcinyttranslerase (B-Eco, 43.7%) 
RP433 sucC sucx^nyKkiA synthetase, p subunit (B-Eco, 52.1%) 
RP432 sucD sutxinyt-CoA syntietase, a subunit (E-Dtfi. 70.7%) 



RP509 exoC prwsphcmannornutase (&-A6r, 42.7%) 
RP299 kcA galactoskjase acetyttransterase (B-A^g 44.4%) 

FATTY ACID AND PHOSPHOUPD METABOUSM 25 

aas 2-acyl^rycerotffiosphate-etha^ (B-Eco, 

39.9%) A 
acor acyt-CoA desatuase (E-Yeast 27.6%) 
acpP acyl carrier protein (B-Lmu, 52.6%) 
acpS hoio-(acy1 carrier protein] synthase (B-Eco, 38.5%) 
btrA biotin Ac-CoA carboxylase synthase (B-Pbe, 33.6%) 
cdsA phosphatidate cyfdytyttransterase (B-Eco. 31 .5%) 
tabD malonyf-CoAAcyt carrier protein tansacylas© 
(B-Bsu, 403%) 

3-crxcwcy1-[acy1-caTTler-prot&in] synthase ll[B- 
Eco, 53.5%) 

tabG 3-c«c«icyHacyt-can1er-ptoteh] reductase (B- 

Rme, 54.8%) 
tabH 3-oxoacyl-{acy^can1er-protein] synthase (B- 
flca, 47^3%) 

tabt erwyi-{acyl-canier-protein] reductase (B-Asp, 49.0%) 
tabZ 3R-hydnxymyrfstoy1 acyl carrier protein deny 

dratase (B-fl/t 01.7%) 
fadA acetyt-CoA acetybansferase (B-Poe. 54.5%) 
feoB tatty oxidation complex a subunit (E-Ge* 30.6%) 
gpsA gr/oeroi-3-phospnale dehydrogenase (B-Eco, 
32.1%) 

kjt prolpoprotein dracytgfyceroi (B-Eco. 39.1%) 
phbB acetoacetyl-CoA reductase (B-Z/a. 52.9%) 
phbC\ poy-beta-hyrjraxybutyrate poiymerase (B-Cvt 
22.7%) 

phoCZ poy-beta-rrydnaxybutyrate polymerase (B-Max. 
37.4%) 

pgpA phc«phatkly1gVcer€phc)^^ 
pgsA phcrspatktylgr/ccrophcsphate synthase (B-Bsu 
40.1%) 

pteC 1 -acyl-g ly ceroi-3-phosphate acyltranslerase (E- 
Sce. 23.6%) 

pssA phosphabdytserine synthase {B-Hpy. 28^.} 
hie malic enzyme (B-Hia 45.5%) 
vacJ VacJ tipoproteh precursor (B-Sd 33.8%) 



RP038 
RP763 
RP577 
RP533 
RP424 
RP735 



RP764 fabF 



RP365 
RP008 

RP737 
RP560 
RP442 

RP046 
RP035 
RP738 



RP750 
RP049 



RP242 
RP373 
RP093 



PURINES, PYRIMIDINES 14 

Dooxyrt)onvcieotide metaooiism 

RPi.C9 rjed deoxycytidine triphosphate dearr.r^se (B-H<.t. 

27^r*) 

RP064 tfgtP oecwyguanosino triphosphate oipliosphohydTo 

tase (B-Eco. 30.1%) 
RP399 dot deoxyuridine 5" ttohosphate nucteotidotrycko- 

lase (B-HJn, 43.1%) 
RP055 ndk nucleoside diphosphate kinase (B-flsu. 56.4%} 



RP513 nrdA ribonucteotide redutase a crwun (E-Sce, 33.7%) 
RP512 nroB ribonucteotide reductase K chain (E-Mmu, 28,6%) 
^ thyrnfdytate syrUhase (E-Dd/47.9%) 



RP301 fjyA 
RP684 tnk 



thymldytate kiiase (B-6su 38.7%) 



Purine ribonodooOde metabotsm 
RP633 aoSr adenylate Wnase (B-Sco, 33^%} 
RP765 gmk guanytate kinase {B-H!rt 49.0%) 
RP220 purC phosprKdbosytapiirajImidsjole-si^ 
amtoe synthase (B-Spa 34.0%) 

Pyrlmkine nbonudeotrto biosynthesis 

RP522 cmk cyGdytate kinase (B-Eco. 34.9*;) 

RP378 pyrG CTP synthase (B-Abr, 54.7%) 



R 1998 wwwjiatt 



RP155 pyrH uridylate kinase (8-Svn. 53.3%) 

REGULATORY FUNCTIONS 14 

RP229 barA histidirw Hnase sensor protein (B-£eft 23.2%) 
RP071 czcR transcriptional activator protein (B-Aeu, 35. 1%) 
RP426 onvZ hisMine kinase osmotariry sensor protein (B- 
Su, 23,65) 

RP294 gppA pppGpp phosphohydrolaso (B-Hpy. 23.3%) 
RP011 nifftt transcriptortaJ actfvatof nitrogon assimilation 

protein (B-Abr. 50.0%) 
RP614 ntrY nistidine kinase nitrogen sensor protein (B-Aca. 

30.6%) 

RP562 ntrX transcriptional activator nitrogen assimilation 

protein (B-Aca, 45.2%) 
RP427 ompft transcriptional activator protein OmpR (B-Rca, 

42.0%) 

RP465 phoR Wstidine Unase phosphatase synthesis sensor 

protein (B-Bsu. 24.4%) 
RP312 spoT' (p)ppGpp 3 -pyrophospfX)hydrolaso (B-Eco. 29.0%) 
RP624 spoT' (pjppGpp 3'i5yroprx»sphohyorc4ase (B-Mpn. 27.8%) 
RP62S spoT* (p)ppGpp 3 -pyroprxjsprx)rTydrolase (B-Eco. 48.7%) 
RP705 spoV (p)ppGpp 3'iJ/ropfXKpnohyoVolase {B-Eco. 31.7%) 
RP517 yhbH sigma 54 modiiation protein (B-fifcia. 26.2%) 

REPLICATION 



.46 



RP172 



Degradation of DNA 

RP734 addA ATP-deoondent nuclease (B-Bsu, 23-7%) 
RP200 xthA\ exa*ecKyrt»nuc4easem(B-Eoo,30.1%) 
RP676 xth/Gt exoo^oxyrtoonuclsase II! (B-Eco. 3^2%) 
RP675 xseA exocteoxyribonuctease large subuna (B-Eco. 31.7%) 
RP350 xseB exodeaxyribonuclease small subuntt (B-Eco, 32.5%) 

DNA replication, restriction, modification, recombination and repair 
RP601 dnaA chromosomal replication initiation protein DnaA (B- 
Eco. 44.1%) 

RPS42 oViaB DNA helicase (E-Os/cp, 40.9%) 
RP778 dnaB DNA poiymerase III alpha subunit (B-Sty. 37.2%) 
RP&50 dnaG DNA primase {B-SrV, 20.0%) 
RP410 dnaN DNA polymerase III beta subunit (fi-Ppu. 20.0%) 
RP732 dnaQ DNA porymerase 111 epslon subunit (B-Sfy. 46.7%) 
RP86S dnaX DNA porymerase III gamma chain (B-Eco, 31.4%) 
RP206 gyrA DNA gyrase A subunit (B-Rsp, 49.4%) 
RP227 oyrSI DNA gyrase B subunit (B-Sct 42.0%) 
RP580 owffi DNA oyraseB subunit (B-^ju. 51 .5%) 

ho/8 DNA porymerase III, delta prime subunit (B-fbe. 
22.3%) 

RP171 hupA DNA btnolng protein HU (B-Vpr. 473%) 
RP720 Ug DNA Pgase (B-Zmo, 45.7%) • 
RP777 metK* S-ederosvtmethtonlne synthetase (B-Eco, 66.3%) 
RP596 mtd transcftption-repalj coupling tactor (B-Hin, 33.9%) 
RP351 mpg DNA-3-mathylatenine grycosWase (E-Hsa, 29.7%) 
RP880 mutL DNA mismatch repak protein MutL (B-Spn, 35.4%) 
RP298 mutS DNA mismatch repair protein MutS (B-Bsu, 39.0%) 
RP746 nth endonuctease II! (B-Eco, 50.7%) 
RP067 parC DNA topoteomerase IV subunit A (B-H/n, 30.0%) 
RP711 pin' trwertaseAocombinase (B-Eco. 38.0%) 
RP776 poiA DNA polymerase I (B-Sca. 37^%) 
RP540 priA prtmosomal protein repBcatbn tactor (B-flru, 39.7%) 
RP546 radA DNA repair (B-Bsu, 46.5%) 
RP761 recA recombination protein Roc A (B-Pde, 71.2%) 
RP029 reef DNA repair protein. ATP binding protein (B-Ccr, 
30.4%) 

RP503 nxQ ATP-dependent DNA helcase (B-Eco, 34.1%) 
RP528 recJ single-stranded DNA-specific exonuclease (B-Eco, 
32.8%) 

RP182 recN recombination protein RecN (B-Hin, 31.6%) 
RP438 recR recombination protein RecR {B-Ssu. 36.0%) 
RP385 hjvA Holiday Junction DNA helicase (8-ftae, 35.0%) 
RP386 ruvB Holiday junction DNA helicase (Pae, 51 .5%) 
RP110 ruvC Holiday Junction efxtodocayribonuctease (B-Eco. 
36.1%) 

RP836 ssb single-stranded binding protein (B-8ab. 52.6%) 
RP326 JopA DNA topoteomerase I (B-Bsu. 44.0%) 
RP835 uvtA repair excision nuclease subuntt A (B-Eco, 57.7%) 
RP203 uvrB repair excrston nuctease subuntt B (B-H/n. 56.0%) 
RP572 uvrC repair exdskxi nuctease subunit C (B-PH 36.0%) 
RP447 uvrD DNA helicase (8-Sau, 43.5%) 
RP817 xerC Integrase/recomblnase (B-Bsu, 32.2%) 
RP361 xorO Integrasa/recombinase (B-Eco. 37.6%) 

TRANSCRIPTION » 



Degradation of RN A 

RP504 pnp polyribonucleoBcle rnjdeotidyttransterase (B-5x>, 
48.0%) 

RP117 rnc rbonuclease III (B-Wpy. 402%) 
RP462 rod rrbonucleasQ D {B-Eco, 28.5%) 
RP256 me riboouctease E (B-Eco, 35.9%) 
RP726 rnhA rtxwuctease HI (B-Wsm, 43.4%) 
RP202 rnhB rbonuclease Hll (B-Eco, 44.7%) 
RP611 rnpA rbonuclease P (B-Atea. 28.4%) 
RP628 rpn rbonuclease PH {B-Hin, 55.05%) 

RNA synthesis and modification 

RP861 greA transcrtption elongation (actor GroA (B-HIn, 61.4%) 
RP553 nusA transcrtption termination tactor NusA (B-Eco. 
36.0%) 

RP162 nusB transcription termination tactor NusB (B-Eco, 
32.0%) 

RP135 nusG transcription antitermination protein NusG (B-Eco. 
42.2%) 

RP015 pcnB poly (A) polymerase I (B-Bsu, 26.3%) 
RP669 rhlE ATP-dependent RNA heficase (B-Eco. 38.3%) 
RP52S rho transcription termination facta Rho [B-Rsp 720%) 
RP635 rpoA RNA porymerase alpha subunit (B-Bpe, 47.2%) 
RP140 rpofl RNA porymerase beta subunit (B-flty, 87.4%) 
RPUl rpoC RNA polymerase beta' subuntt (B-Eeo. 58.8%) 
RP303 rpoH RNA porymerase sigma-32 tactor (B-Atu, 52.0*.} 
RP858 rpoO RNA porymerase sigma-70 tactor (B-flca, 50.5%) 

TRANSLATION 



AminoacyhtRNA synthetases 
RP&56 aiaS aJanyt-lRNA synthetase (B-B&a. 52.7%) 
RP065 argS arglnyMRNA synthetase (B-Hpy. 33.0%) 
RP145 asps aspartyWRNA synthetase {B-Syn. 43.3%) 
RP085 cysS cyste'nyl-tRNA synthetase (B-HH 46.0%) 
RP325 giOQ gfutamyt-tRNA synthetase {B-Abr. 45.6%) 
RP623 gtVQ gtutarrryt-tRNA synthetase (B-Hpy, 40.3%) 
RP850 giyQ gtycyHRNA synthetase (B-Afca. 60.6%) 

" " grycyVtRNA synthetase (B-Bsu, 32.9%) 

histldyMRNA synthetase (B-Eco, 383%) 
isoieucyl-tnNA synthetase (B-Mfu. 48.6%) 
leucyt-tRNA synthetase (B-Eco. 45.3%) 
rysyt-tRNA synthetase (B-Bbu, 26.3%) 



.118 



RP849 p; 

RP306 hki> 

RP620 UoS 

RP421 teuS 

RP371 rysS 
RP683 
RP417 



merS metWonyl-tflNA synthetase (B-flsu 46.0%) 
pheS phenytalanyHRNA synthetase alpha sub (B- 



i-Hrft. 



40.2%) 



RP418 poef phenyblanyHRNA synthetase beta sub (B-Hfn. 



33.9%) 

RP384 proS proKne-lRNA synthetase (B-Zmo. 51 .8%) 

RP783 serS seryMRNA synthetase (B-Cbu. 47.2%) 

RP221 thrS threonyl-lRNA synthetase (B-Hin. 50.0%} 

RP468 trpS tryptophanyMRNA synthetase (B-^a 48.5%) 

RP556 tyrS tyrosyl-lRNA synthetase (B-Sca. 38.7%) 

RP667 valS vaJyMRNA synthetase (A-A^a. 38.3%) 

tRNA and amino acyi-tRNA modifcauon 
RP208 dot mothjonyHRNA dotormytase (B-Eco. 49.4%) 
RP200 tint methlrK^tRNAlwniyttranslerase(B-Wh,41.o*it 
RP152 gatA glutamyi-tRNA (Gin) amidotmnsterase subunit A (B- 
Mca, 48.6%) 

RPi 51 gatB glutamyl-iRNA (Gin) amtdotransferase subunit B (B- 
Wca. 46.9%) 

RPI 53 gate glutamyl-tRNA (Gtn) amidotrensterase subunit C (B- 
Bsu. 24.7%) 

RP672 ksgA dimethyladenosine transterase (B-Bsu, 35.7%) 
RP510 miaA (RNA detta-2-rsopentenytoytophosphate (IPP) trans 

(erase (B-Afu, 30.7%) 
RP605 pth pepfldyWRNA hydrolase (B-HIa 40.5%) 
RP213 queA S-eo^oosytmemiorune:tRNA ribosyttiartsterase-tso 

merase (B-HJA 43.3%) 
RP721 ©l tRNA-guanino transgrycosytase (B-Zma 61.2%) 
RP111 trmD tRNAJguanine-NIJ-methyttransterasetB-Eca 

44.7%) 

RP657 truA pseudourldylate synthase I (B-Eco, 40.1%) 
RP501 truB . mNApseu^Hlne5Ssvnthase(B-Mfn.37.6%) 

Degradation of proteins, peptides and gtycopeptkies 
RP036 dpB ATP-dependent protease, ATP binding subunit (B- 
HJn, 54.3%) 

RP520 dpP ATP-dependent Op protease (B-Vbn, 67%) 
RP602 cfpX ATP-dependent protease, ATPase subunll (B-Eco. 
62^%) 

RP228 ctp tail-specific protease precursor (B-Bba, 42.6%) 
RP037 gep sialogrycoprotein endopeptidase (B-Hin, 42.2%) 
RPI 23 htiC lambda ell stablity-govemirig protein (B-£co. 33.0%) 
RP122 hSK lambda ell stability-governing protein (B-V^aa, 
303%) 

RPI 24 htrA sertne protease (B-Bha, 37.7%) 

RP186 htrA protease 00 (E-Sce. 26.7%) 

RP450 Ion ATP-dependent protease LA {B-Ccr, 53.1%) 

RP408 tspA Upoprotein signal peptidase (B-Bsu. 27.9%) 

RP824 map methfony! amlnopepttdase (B-Sty, 553%) 

RP219 mpp mitochondrial protease (B-Bsu, 35.4%) 

RP142 popA amlnopeptioase A (B-Pea, 36.6%) 

RPI 74 ppcE peptidase II (B-Rsn, 3Z5%) 

RP281 ptrB protease 11 (B-Eco. 34^%) 

RP308 sohB protease IV (B-Mja, 23.9%) 

RP525 SppA protease IV (B-Hin. 27.6%) 

Protain modification and translation factors 

RP238 sip etongatton tactor P (B-Bsu, 39.5%) 

RPI 32 tusA elongafion tactor G (B-Afu, 68.7%) 

RP814 InfA Initiation tactor IF-t (B-Mn, 67.1%) 

RP552 InfB Initiation tactor IF-2 (B-W/n, 42.6%) 

RP531 InfC initiation tactor (B-Pw, 47.7%) 

RP529 prfA poptkie chain release tactor RF-1 (B-Bsu. 50.1%) 

RP274 prfB peptide chain release tactor RF-2 (B-Eco, 50.4%) 

RP435 rtifA ribosome binding tactor A (B-Bsu, 31.6%) 

RP603 rtmJ ribosome protein alanine acetyttransbrase (B-Eco. 
2a2%) 

RPI 54 rrf ribosome recycling tactor (B-Hix 43.3%) 

RP397 SpA mW:disutohideinterdiaT)geDfotein(B-BjB.27-4%) 

RP661 tuf elongation tactor Tu (B-Tcu, 81.5%) 

RP087 tsf elongation tactor Ts (B-Sci 40.7%) 

Rlbosomal proteins; synthesis and modification 
RP137 rplA ribosomal protein LI (B-Cgr, 5a2%) 
RP656 rpfB rlbosomal protein L2 (E-Rammt. 61.5%) 
RP6S9 rpIC ribosomal protein L3 (E-Sce. 44.1%) 
RP658 rjAD ribosomal protein L4 (B-Sst 39.3%) 
RP647 rpiE ribosomal protein L5 (B-Aco. 53.6%) 
RP644 rplF rlbosomal protein L6 (B-Ssf. 45.4%) 
RP041 rptl rlbosomal protein L9 (E-^Jtfcp, 33.6%) 
RP138 rpU rfbosoma! protein L10 (B-Laf, 36.7%) 
RP1 36 rplK ribosomal protein Lt 1 (E-flam mt 45.5%) 
RPI 39 rpL ribosomal protein L7/L12 (B-Bab. 66.9%) 
RP233 rplM rlbosomal protein L13 (B-Sca, 52.8%) 
RP649 rptN rlbosomal protoh L14 (B-Eco, 69.6%) 
RP640 rpK> ribosomal protein L15 (B-Bsl 46.5%) 
RP652 rpiP ribosomal protein L16 (B-Aac. 53.3%) 
RP634 rpKD rlbosomal protein L1 7 (B-Eco. 57.5%) 
RP643 rplR ribosomal protein L18 (B-Sst 58.6%) 
RP1 12 rptS rlbcsomal protein L10 (B-Eco, 58^%) 
RP609 rprT ribosomal protein L20 (B-Psy, 61.5%) 
RP751 rpfU ribosomal protein L21 (E-Sot 423%) 
RP654 rpfV ribosomal prolein L22 {B-Eco, 50.0%) 
RP657 rpfW ribosomal protein L23 (8-Sst 463%) 
RP648 rplX ribosomal protein L24 (B-Bst 55.0%) 
RP606 rplY rlbosomal proteh L25 (B-Wlu. 26.0%) 
RP752 rpmA rlbosomal protein 127 (E-Ram mt, 6Z9%) 
RP099 rpmB ribosomal protein L28 (B-Mge. 43.7%) 
RP651 rpmC ribosomal protein L20 (B-Bst 30.4%) 
RP641 rpmD rlbosomal protein L30 (B-Mlu. 33.3%) 
RP100 rpmB ribosomal protein L31 (B-MftA 31.6%) 
RP773 rpmF ribosomal protein L32 (B-Hin, 40.1%) 
RP879 rpmG ribosomal protein L33 (B-Hin, 51.8%) 
RP610 rpmH ribosomal protein L34 (B-Ppu. 65.9%) 
RP608 rpmt rlbosomal protein L35(E-flpucp, 385%) 
RP456 rpmj ribosomal protein L36 (E-Gtft cp, 65.8%) 
RP521 rpsA ribosomal protein Si (B-fl/ne, 48.6%) 
RP086 rpsB ribosomal protein S2(B-Aflu. 41.5%) 
RP653 rpsC rlbosomal protein S3 (B-Bst 54.2%) 
RP345 rpsD rlbosomal protein S4 (B-Bsu. 43.2%) 
RP&42 rpsE ribosomal protein SS (B-Bst. 50.6%) 
PP0:>9 rpsF rlbosomal protein S6 (B-Hin, 202%) 
RPI 3i rpsG ribosomal protein S7 (E-fiam ml 45.2%) 
RP645 rpsH ribosomal protein S8 (B-Bsu. 42.0%) 
RP234 rpsl rlbosomal protefri S9 (B-Bst. 48.8%) 
RP660 rpsJ rlbosomal protein Si 0(B-W*a 603%) 
RP636 rpsK ribosomal protein Sit (B-Syn. 53.5%) 
RPI 30 rpsl rlbosomal proteh S12 (B-rih. 60.5%) 
PP637 rpsM ribosomal protein SI 3 (B-Bst 583%) 
HP646 rpsN rlbosomal protein Si 4 (B-Syn, 47.0%) 
RP503 rpsO rlbosomal proteh SIS (B-Bst 53.4%) 
RP878 rpsP ribosomal proteh Si 6 (B-Hin 45.1%) 
RP650 rpsO ribosomal proteh S17 (B-Ttrt.613%) 
RP040 fpsR rtoosomal protein SI 8 (B-Syn. 50.7%) 
RP655 rpsS ribosomal protein S10 (B-Eco, 58.2%) 
RPC3? rpsT ribosomal protein S20 (B-flme. 40.9%) 
RP61 5 rpsU rlbosomal protein S21 (B-Bbu. 333%) 

TRANSPORT AND BINDING PROTEINS 38 

RPOScfatcn ABC transporter. ATP-binding protein (B-Wfn. 
55.7%) 

RPS08 aficT2 ABC transporter. ATP-binding protein (B-fime, 



488%) 

RP214 abc73 ABC transporter. ATP-binding protein m ■ 
33.6%) l0 "^*\l 

RP387 mst>41 ABC transporter. ATP-binding prok-n m.r \ 
26.2%) lt *' E «ll 

RP606 msuA2 ABC transporter. ATP-binding protein rn.c^ 1 
28.2%) 



cation 1c amino acid transporter (E-Wmu jo M 
glutam'oe transport system permease (tSo^M 
48,6%) 

gbtamhe ABC transporter, ATP-binding p 

glutamlne ABC transporter, ATP-brndina a 
est 51.0%) 

gtutamate-espartate transporter (B-fib^sj 
putresdne-ornithine transporter (B-Hh, 2b 
putresdne ABC transporter. ATP-bJndini n 
Mpn.292%) K 
proUnQ/betate transporter (B-Eco. 26.7%) 
proUne/betaJh transporter (B-Eco, 24.0%) 
profine/betain transporter (B-Eco, 21.2%) 
prolinQ/betan transporter (B-Eco, 24.0%) 
proline/betain transporter (B-Eco, 27.8%) 
protine/betain transporter (B-Eco, 34.8%) 
protlne/betaki transporter (B-Eco. 287%) \ 
ammo acid ABC transporter (B-flsu. 32,4%) 



Amino acWs 
RP307 atrcl 
RPi 20 gtnP 

RP7O0 glnQ\ 

RP868 gtnOZ 

RPI 76 gitp 
RP483 pOtE 
RP360 porG 

RP077 proP\ 
RP313 proPZ 
RP375 proP3 
RP685 proPA 
RP755 proPS 
RP8S2 proPe 
RP881 proPT 
RPI SO yqOC 

Nucleosides and nucleotides 

RP007 mfd ribonucleotide ABC transporter, ATP-bhrjbu 2 
(B-Mte, 36.2%) 1MB 
RP0S3 Vet ATP/ADP transtocase (B-Ctr, 43.3%) 
RP377 uc2 ATP/ADP transtocase (B-Ctr, 35.2%) 
RP477 tlc3 ATP/ADP transtocase (B-Ctr. 30.6%) 
RP500 tlc4 ATP/ADP kanslocase (B-Cfr-. 36.3% 
RP739 tlcS ATP/ADP fanskxase (B-Ctr. 34.7%) 

Carbohydrates, organic ateohds and adds 
RP0S4 gtpT gtyoarol-3-phosphato permease (B-Bsu, 37.1' 

Cations 

RP834 afuC Iron ABC tansporter. ATP-binding protein f&J 
33.0%) 

RP810 ke(B glutathtonine-regulated potassium-efflux tyvte 

teln (B-Eco, 33.9%) 
RP583 mgtB magnesium tansporter (B-Syn. 27.0%) 



Other 

RP205 atml mitochondrial ABC transporter, ATP-binding m 
(E-Sce. 43,3%) ; 
RP794 ccmA haem ABC bansporter A, ATP-binding proteta| 

Hin, 35.5%) «. 
RP268 ccmB haem exporter protein 8 (E-flam ml 20.0%) 
RP830 ccmC haem exporter protein C (B-S/a, 43.7%) 
RP571 panF panthotenate permease (B-Hin, 20.5%) 
RP630 perM permease PerM homotogue (B-H!n, 25.0%) 
RP374 sec7 transport protein Sec7 (B-Wsa. 20.6%) 
RP576 prsA protein export (B-Bsu. 28.0%) 

OTHER CATEGORIES « 



Adaptations to atypical conditions 
RP708 hlmA Intogratton host tactor a (B-Eco. 29.5%) 
RP236 InvA Invasion protein A (B-Bba, 423%) 
RP590 mviN virulence tactor Uvin protein (B-Sty, 32.4%) 
RP717 taxB* oonfugaSvo DNA processing (B-Eco, 33.5%) | 
RP286 trbG conjugal tanster (B-flsn, 24.7%) 
RP103 virBA virulence protein V1RB4 (B-Atu, 30.9%) 
RP764 virBA virulence protein V1RB4 (B-Atu. 203%) 
RP287 vlrBB virulence protein VIRB8 (B-AfU, 20.4%) 
RP200 W/B9 vfmlence protein VI RBO (B-Atu. 24.8%) 
RP201 W/S10 virulence protein V1RB10 (B-Alu. 20.3%) 
RP202 vUB\ 1 virulence protein VI RB1 1 (B-Atu, 29.6%) 
RP293 vtrDi vtrutence protein VIRD4 (B-Atu, 313%) 

Drug and analogue sonshMty 

RP1 70 acrO acrillavh resistance protein D (B-Eco, 31 3%) 
RP475 ampGn AMPG protein (B-Eco. 31 .4%) 
RP668 ampGEAMPG prt (B-Eco, 26.3%) 
RP781 ampG3AMPG prt (B-Eco. 27.6%) 
RP603 bc/1 bk^ctomycto resistance (B-Eco. 21.7%) . 
RP698 bcQ. Wcyctomycin resistance (B-Eco, 18.8%) 
RP243 emrA muttidrug resistance protein A (B-Eco, 26.9%) 
RPI 57 emrB muttidrug resistance protein B (B-HIn, 203%) 
RP786 terC tellurite resistance protein (B-Eco, 353%) [ 

CoHdn-retated functions 

RP302 to© colkHn toierance protein (B-Hin, 29.8%) 
RP300 tod inner membrane protein (B-Eco, 30.7%) , 
RP310 totR inner membrane protein (B-Pae, 40.1%) 

Uncategoflzed 

RP493 addA addudn alpha subunit (E-Hsa, 32.6%) 

RPi 99 adxi aoVenodcBdn precursor (E-Spo, 57.1%) 

RP714 anA2 ankyrin (E-Hsa. 32.7%) 

RP245 bolA BolA prototn (B- VaJ, 34.2%) 

RP181 ctaQ* trjormostabte carboxypeptJdase (B-Pbo. 29. 1\ 

RP297 cysQ sulphite synthesis pathway protein (B-Eco, 31^ 

RP323 cyaY CyaY protein (B-Ec/t. 31.1%) 

RP 1 1 8 era GTP brndtng protein Era (B-Bsu. 33.6%) 

RP063 hesB\ HesB protein (B-Ava 37.0%) 

RP484 hesB2 HesB protein (B-Pbo. 40.2%) 

RP212 n2B N2B, ATPase protein (E-H/r. 27.2%) 

RP485 nirj nitrogen txation protein (B-Avi 43.0%) 

RP832 p34 P34 protein (B-fl/t 01. 3",i) 

RP602 patl patattn Bi precursor protein (E-S(u. 22.9%) 

RP317 pket protein Wnase C inhibitor (B-Abr, 38.6%) 

RP109 ptb phosphate buturyttransteraso (B-Cab. 3C.4%) 

RP594 scoB* succtovK^3-ketoacld-CoA transterase sUw 

(B-MO/. 22.0%) 
RP031 sco2 Sco yeast precursor protein (E-Sce, 32.6%) 
RP587 sco2 Sco yeast precursor protein (E-Sce, 36.6%) 
RP846 slhB SfhB protein (B-Zmo, 40.6%) 
RP430 smpB small proton (B-Sfyn. 46.7%) 
RP058 so} SCO protein (B-Bsu. 50.4%) 
RP486 soil tRNA spficing protein (E-Cat 58.0%) 
RP487 sptt tRNA spidng protein (E-Ca/, 32.3%) 
RP050 spcOJ sporulation protein (B-Bsu. 40.4%) 
RP230 suhB supressor protein (B-Eco. 223%) 
RP733 surfl SurFl protein (E-Hsa, 230%) 
RP710 tra3* fransposase (B-Rmo. 34.0%) 
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1 protein [B-Hin, 
jprotoin (B too, 
9 ptrteln [BBco, 



W (E-Wmtt ».7\\ ,| 
Jormoasa (B-8$n ■ 

<TP-ttn<Ilngpfott,p3 



rtQr(B-Hfc26j9%) 
ATP-bkxfcig 

2a 7%) 
i-Eoa. 24.0%) 
J-EOQ.21.2%) 
J-ECO. 24.0%) 
J-fOO. 27.6%) 
i-EC* 34.8%) 
i-Bco, 2&7%) 
r (frBstt 32.4%) 



rtf, ATP-bftfing prt*fe] 

*,3&2%) 
•V. ».6%) 
•tr.36^%) 
*.M.7%) 



MBSO (B-Bsu, 37.1%) 

-tiinc&ig protein (B-Eo; *j 
wsJum-ortux system poj 
t-sya 27j0%) 

artor, ATP-bindlng prctakT] 

rtTP-Wnding protein {B- j 

(E-fl«mml 20.0%) 
(B-flSfc 43.7%) 
B-Hin, 20.6%) 
ujg (BJfla 25.0%) 
j-Hsa.28.6%) 
t.0%) 



A 42*%) 

*tn(B-Sfy.32.4%) 
stng {B-Eca, 33.5%) 
.24.7%) 
(B-AfU.303%) 
(B-A(U»^%) 
{B-AOl 20.4%) 
(B-A/u. 24.8%) 
5 (B-AU 20.3%) 
1 (B-A(ia29.6%) 
(B-AIU, 31.3%) 



Ertn D(B- Eco, 31 J*) 
11.4%) 

%) 
%) 

( (B-Eoa21.7%) 

. (B-Eco. 18.8%) 

D«ln A (B-Eco, 26.9%) 
otein B (B-Wn, 20.3%) 
ein (B-&0, 35.3%) 



n (B-Eca 39.7%) 
n(B-ft*40.1%) 

{E-Hsa 32£%) 
{E-Spq, 57.1%) 
-) 

5 ep tkteso{B-flr»,2B.1*L , 
iWoy protein (B-Eca 315*) 

31.1%) 

: sa (B-Bsu. 33.6%) 
27.0%) 
40.2%) 
(E-Hfc 27.2%) 
m (B-Avi 43.0%) 

^'eln[E-Stu22.9%) 
•cor (B-Afcr. 38,6%) 
eterasa (B-Cab. 36.4%) "i 
- Kj-CoA trareteraso si«i» "J 

orotoki (E-Sce, 36.6%) 
40.6%) 
46.7%) 
50.4%) 

, (E-Cat 66.9%) 
, (E-Cat 32.3%) 
3-estt 40.4%) 
-Era 223%) 
a. 23.0%) 
1.34-0%) 



fti'fj. Comparison of the sequences from ten different Rickettsia 
$U indicates that the disruption of the rRNA gene operon 
Sded the divergence of the typhus group and spotted fever 
fn'Jtkkettisia (S.G.E.A. et al> unpublished observations). In 
fjfion, the genome contains a short sequence with similarity to a 
nucleotide RNA molecule in Bradyrhizobium japonicum that 
jlate transcription 25 . 
Sphere are unusually few repeat sequences in this genome. We 
Stifled four different types of repeat sequence: all of these are 
afed in intergenic regions. There is a sequence of 80 bp that is 
_ated seven times downstream of rpmH and rnpA in the dnaA 
ion* A repetitive sequence of 325 bp is found at two intergenic 
Sons that are more than 80 kb apart, downstream of the genes 
S&and rnh, respectively. A 440-bp-long repetitive sequence has 
^identified at two intergenic sites, 140 kb apart; one of these 
Sfeis downstream of rrf and the others downstream of pdhA and 

]Je l 1 Asterisks indicate putative pseudogenes. Abbreviations of species names 
&$aaena:Acinetobacter. cafcoaceticus {BAca),Actonobaa'ilus actinomycetem- 
ians [B-Aac), Acyrthosiphon condii [BAco), Agrobacterium tumefaciens 
i)]Alcaligenes eutrophus {BAeu),Anabena sp. PCC7120 [BAsp),Anabena 
kahilis \BAva),Anacystis nidulans {B-Ani),Azorhizobiumcaulinodans (B-Aca), 
rittum brasiliense (BAbr),Azotobacter vinelandii {B-Avi) t Bacillus caldote- 
Wt(B£ca). Bacillus stereothermcphilus (B-Bst), Bacillus subtilis (B-Bsu), 
"~ ella bacilliformis (B-Bba ), Bartonella henselae [BShe ), Bordetella pertussis 
e), Borrelia burgdorferi {B-Bbu) t Bradyrhizobium japonicum {B-Bja), Brucella 
ijs (B-Bab), Brucella ovis {B-Bov), Caulobacter crescentus (B-Ccr), 
Wmydia trachomatis (B-Ctr), Chloroflexus aurantiacus {B-Cau), Chromatum 
im (B-Cvi), citnjs-greening-disease-associated bacterium (B-Cgr), 
'dium acetobutyficum (B-Cac), Clostridium pasteurianum (B-Cpa), 
^■dium thermosaccharolyticum (B-Cfs), Coxiella burnetii (B-Cbu), Erwinia 
l ianthemi {B-Ech), Escherichia coli (B-Eco), Haemophilus influenzae {B-Hin), 
er pylori (B~Hpy), Klebsiella pneumoniae {B-Kpn) t Legionella pneumo- 
,W[B^pn), Leucothrix mucor [B-Lmu), Liberobacter africanum {B-Laf), 
\$jethyfobacterium extorquens (B-Mex), Micrococcus luteus (Mlu) t Moraxella 
^yStafThalis {Meal Mycobacterium leprae {Mle), Mycobacterium smegmatis 
Mycobacterium tuberculosis {B-Mtu), Mycoplasma capricofum 
^ca),. Mycoplasma genitalium [B-Mge), Mycoplasma pneumoniae [B-Mpn), 
t denitrificans {B-Pde), Pasteurefla haemofytica {B-Pha), Plectonema 
anum (B-Pbo), Proteus mlrabilis (B-Pmi), Proteus vulgaris [BPvu), 
domonas aeruginosa (B-Pae). Pseudomonas fluorescens (B-Pff), 
domonas putida (B-Ppu), Pseudomonas syringae (B-Psy), Rhizobium 
(B-Rme), Rhizobium sp. NGR234 (B7?sp), Rhodobacter capsulatus 
a), Rhodobacter sphaeroides {B-Rsp), Rhodobacter sulfidophilus {B-Rsu), 
1 dopseudomonas blastica {B-Rbl), Rhodospirillum rubrum {B-Rru), Rickettsia 
inicum [B-Rja), Rickettsia rickettsii (B-Rri), Rickettsia typhi (B-Rty), Salmonella 
f {BSti) t Salmonella typhimurium {B-Sty), Shigella flexneri {B^Sfl), 
'oplasma citri (B-Sc/), Staphylococcus aureus (frSau), Staphyloccus camo- 
Sca), Streptococcus pneumoniae (B~Spn), Streptomyces clavuligerus 
\!^) t Streptomyces coelicor {BSco), Synechocystss PCC 6803 [B-Syn), 
r^emrt/s aquaticus {B-Taq), Thermus thermophilus [BTth), Thiobacillus cuprinus 
[ L^Qt>), Treponema hyodysenteriae [B-Thy), Vibrio alginolytics (BVa/), Vibrio 
1 2fete/a {B-Vch), Vibrio parahaemolyticus (BVpa), Vibrio proteolytics [BA/pr) t 
hia sp. {BWsp), Yersinia entercolitica (B-Yen),Zooglea ramigera (B-Zra), 
nonas mobilis [B-Zmo). Archaea: Methanococcus jannaschii (A-M/a), 
lllpbus acidocaldarius {ASac). Eukaryotes: Apis mellifera {B-Ame), 
Vopsis thaliana {E-Ath), Atratylodes japonica (B-Aja), Bos taurus {B-Bta), 
tixijda albicans (E-Ca/), Caenorhabidits efegans (E-Cel), Dictylostelium dis~ 
(E-Orf/), Flaveria trinervia {E-Ftr) t Giardia theta (E-Gth), Glycine max {E- 
VHaematobia irritans {E-Hir), Homo sapiens (E-Hsa), Marchantia polymor- 
tfe-Mpa), Mus muscutum (E-Afmu), Prototheca wickerhamii {E-Pwi) t Petunia 
(E-P/iy), Pisum sativum (E-Psa), Porphyra purpurea (E-Ppu), Odontella 
$s (E-Osi), Reclinomonas americana {E-Ram). Rattus novergicus {E-Rno), 
lis oryzae (E-flor), Saccharomyces cerevisiae (E-Sce), 
^saccharomyces pombe (E-Spo), Solanum tuberosum {E-Stu), Spinacia 
"i(E^o/). 





pdhB. Finally, two similar sequences of 730 bp are located immedi- 
ately next to each other at 850 kb. 

Paralogous families. We have identified 54 paralogous gene 
families comprising 147 gene products. Of these, 125 have an 
assigned function. Most paralogues encode proteins with transport 
functions, such as the ABC transporters, the proline/betaine trans- 
porters and the ATP/ADP transporters. Five paralogous genes 
located next to each other at 115kb encode putative integral 
membrane proteins with unknown functions. 

Biosynthetic pathways 

A striking feature of the R prowazekii genome is the small propor- 
tion of biosynthetic genes compared with free-living proteobacterial 
relatives (such as Haemophilus influenzae y Helicobacter pylori and K 
coli) xswq . This scarcity of biosynthetic functions is also seen in 
diverse endocellular and epicellular parasites 16 " 18,23 . This scarcity of 
biosynthetic functions is also seen in diverse endocellular and 
epicellular parasites 16 " 183 . 

Amino-acid metabolism. As many as 43 and 69 genes required for 
amino-acid biosynthesis are found in Helicobacter pylori and 
Haemophilus influenzae y respectively. In contrast, Mycoplasma geni- 
talium and Borrelia burgdorferi contain only glyA, which encodes 
serine hydroxymethyltransferase. This gene is also found in R. 
prowazekii (Table 1). Serine hydroxymethyltransferase catalyses 
the conversion of serine and tetrahydrofolate into glycine and 
methylenetetrahydrofolate, respectively. A role in tetrahydrofolate 
metabolism may account for the ubiquity of glyA in bacteria. 

Seven genes normally associated with lysine biosynthesis (lysC, 
asdy dapA y dapB y dapD, dapE and dapF) are also present in & 
prowazekii. The biosynthetic pathways leading to lysine, methionine 
and threonine share the first two of these (lysC and asd). However, 
none of the downstream genes for threonine biosynthesis are found, 
in R. prowazekii. Likewise, the lysine pathway is incomplete, and 
lysAy which encodes the enzyme that converts meso-diarninopime- 
late to lysine, is missing. The likely role of the upstream genes of this 
pathway in R. prowazekii is the biosynthesis of diaminopimelate, an 
essential envelope component. We have therefore classified these 
genes as 'cell-envelope* genes (Table 1). 

We have identified other genes that are superficially involved in 
the metabolism of amino acids, but which apparendy function in 
deamination pathways that divert amino acids into the tricarboxylic 
acid (TCA) cycle. For example, there is aatA y encoding aspartate 
aminotransferase, which catalyses the degradation of aspartate to 
oxaloacetate and glutamate. tdcB encodes threonine deaminase, 
which converts threonine into ot-ketobutyrate. Another gene {ilvE) 
encodes branched-cham-arnino-acid aminotransferase, which con- 
verts leucine, isoleucine or valine into glutamate. pccA and pccB 
encode propionyl-CoA carboxylase, which converts propionyl- 
CoA, an intermediate in the breakdown of methionine, valine and 
isoleucine, into succinyl-CoA. The pccA and pccB gene products 
show greatest similarity to the eukaryotic proteins that are located 
in the mitochondrial matrix. 

Nucleotide biosynthesis. No genes required for the de novo synth- 
eses of nucleosides have been found in the R. prowazekii genome. 
However, four genes required for the conversion of nucleoside 
monophosphates into nucleoside diphosphates (adk y gmk y cmk 
and pyrH) are present. There are also two genes encoding ribonu- 
cleotide reductase, which converts ribonucleoside diphosphates 
into deoxyribonucleoside diphosphates. Nucleoside diphosphate 
kinase (encoded by n dk) y which converts NDPs and dNDPs to NTPs 
and dNTPs, is also present in JL prowazekii. Finally, there is a 
complete set of genes for the conversion of dCTP and dUTP into 
TTP, including thyA y which codes for thymidylate synthase. Thus, 
the R prowazekii genome encodes all of the enrymes required for 
the interconversion of nucleoside monophosphates into all of the 
other required nucleotides. The nucleoside monophosphates are 
probably imported from the eukaryotic host. 
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Energy metabolism 

Early in its infectious cycle, R. prowazekii uses the ATP of the host 
with the help of membrane-bound ATP/ADP trar.slocr.ses. How- 
ever, H prowazekii is also capable of gensrating ATP, which may 
compensate for the gradual depletion of cytosolic ATP later in the 
infection. R. prowazekiVs repertoire of genes involved in ATP 
production and transport include determinants tor the TCA cycle, 
the respiratory-chain complexes, the ATP-synthasc complexes and 
the ATP/ADP translocases (Table 1). Geres to support anaerobic 
glycolysis are absent. . 
Pyruvate dehydrogenase. Pyruvate is imported into mitochondria 
directly from the cytoplasm and converted into icetvl-CoA by, 
pyruvate dehydrogenase. The genes encoding three components 
(E1-E3) of the pyruvate dehydrogenase complex are found in K. 
prowazekii, indicating that it too uses cytosolic pyruvate. Pyruvate 
dehydrogenase (El) consists of two sub units (a and P) in it 
prowazekii, mitochondria and Gram-positive bacteria; the corre- 
sponding genes are clustered in the genome. In co.itrart, proteo- 
bacteria such as E. coli, Haemophilus influence a^d hehcobacter 
pylori have a single subunit for the El component and these have 
little similarity to the a and p subunits of the El component in R. 
prowazekii and mitochondria (data not showi). 

Two paralagous genes code for the dihydroUpoairadc dehydro- 
genase (E3) in R. prowazekii One of these most resembles mito- 
chondrial homologues, whereas the otier is rnost similar to 
bacterial homologues (data not shown). The presence of several 
paralogous gene families for pyruvate dehydrogenases complicates 
attempts to reconstruct a genome phylogeny basec. on these genes. 
ATP production. Genes encoding all enzymes in the cycle are 
found in K prowazekii Proton translocation is mediated by NADH 
dehydrogenase (complex I), cytochrome reductase (complex III) 
and cytochrome oxidase (complex IV). Several clivers of genes 
code for components of the NADH dehydrogenase complex. Seven 
of these genes (nuoJKLM and nuoGHO) sre located near to each 
other, but the order of genes is inverted rel ative to the order of this 
cluster in R coll An additional set of five ge^es is grouped in the 
order nuoABCDE, but the single genes nuoF and nvoH are distant 
from both of these clusters. Several protein in the cytochrome bc t 
reductase complex, such as ubiquinol-cytochrome c reductase 



(encoded by petA), cytochrome b (encoded by cytb) an 
chrome c x (encoded by fbhQ, are present, as are several sub 
the cytochrome oxidase complex. 

The ATP-synthesizing complex is composed of the ATP 
F t component (comprising five polypeptides, a, p, -y, € and 
the F 0 component, a hydrophobic segment that spans th 
mitochondrial membrane. The genes encoding these com 
are normally clustered in one of the most highly conserved 
structures in microbial genomes. In R prowazekii, how- 
ATP-synthase genes encoding the a, P, 7, 8 and e subunits of 
complex (atpH, atpA, atpG, atpD and atpC) are clustered, 
common order, but atpB, atpE and atpF, encoding the A, £ 
chains of the F 0 complex, are split from this cluster. 

Replication, repair and recombination 

& prowazekii has a smaller set of genes involved in DNA repH 
than do free-living bacteria such as E. coli, Haemophilus infl 
and Helicobacter pyrlori. Four genes have been identified tK 
for the core structure of DNA polymerase HI, which inclu 
(dnaE), e {dnaQ), p {dnaN), y and 0 (dnaX) subuni 
subunits present in the R coli DNA polymerase III are 
from R. prowazekii, as well as from M. genitalium and B. burg, 
Genes encoding DNA-repair mechanisms are similar in the 
genomes of the parasites R. prowazekii, M. genitaliw' 
B burgdorferi. Thus, genes involved in the repair of ultra 
induced DNA damage (uvrABCD) have been identified in 
■genomes. In R. prowazekii, DNA-excision repair probably 0 ; 
a pathway involving endonuclease III, poll and DNA ligas 
B. burgdorferi. . , ^ r 

The R. prowazekii genome has a limited capacity tor ■ 
repair. The DNA-rriismatch-repair enzymes encoded by mutL and 
are present, but mutH and mwfYare not There is a complete lack 
genes in M. genitalium, but mutL and mutHLY have been iden 
in B burgdorferi and Chlamydia trachomatis. The transcnp 
repair coupling factor (encoded by mfD) is found in & prow 
B. burgdorferi and C. trachomatis but not in Af. gemtahum.: 

The & prowazekii genome contains several genes invoh v 
homologous recombination, such as recA, recF, recJ,recNm 
A similar set of genes has been found in A. aeolicus 2 . The r~ 
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Ottered in the other small genomes of parasites. The RecBCD 
-jex is missing in R prowazekii, M. genitalium and Helicobacter 
* but it has been identified in B. burgdorferi. 

inscription and translation 

^wazekii has three subunits (a, (3 and (J') of the core RNA 
erase, as well as a 70 arid one alternative a factor, ct 32 , which 
tfols transcription of the genes encoding heat-shock proteins in 
If: Genes involved in transcription elongation and termination, 
; nu$B, nusGygreA and rho, are also present The gene encoding 
ls> absent in most other small genomes, such as those of 
frjfdorferit Helicobacter pylori y M. genitalium and C. trachomatis* 
biigh genes for heat-shock proteins are present, 
■unusually large number of genes involved in RNA degradation 
bund in R prowazekiL Of these, only four appear to be common 
e bacterial genomes analysed so far (those encoding polyribo- 
dejotide nucleotidyltransferase and ribonucleases HII, III and P). 
^rnore ribonucleases (D, E, HI and PH) are present in R 
'azekxi, but in none of the other small parasites. 
f[the 33 identified tRNA genes, which code for 32 different 
A isoacceptor species, two code for tRNA phe . There are two 
^A species for most of the amino acids that are encoded by four- 
oh boxes; the exceptions are the four-codon boxes for proline 
yaline, for which we have identified only one isoacceptor-tRNA 
-'es, with U in the first anticodon position. selQ which codes for 
, and selABD are missing. R. prowazekii has a set of genes 
g for tRNA modifications (tgt, queA, trmD, truA, truB and 
„ ) which resembles that of Helicobacter pylori, C. trachomatis 
ftB. burgdorferi; M. genitalium has only trmD and truA. 
11 x R prowazekii, 21 genes encode 18 of the 20 aminoacyl-tRNA 
thetases normally required for protein synthesis. There are two 
es igltX) encoding glutamyl-tRNA synthetase. As seen in several 
erial genomes 25 , the gene coding for glutaminyl-tRNA synthe- 
glnS, is missing. Three genes encoding subunits of the 
Lyi-tRNA amidotransferase are present, indicating that a 
,Iutamyl-tRNA charged with glutamic acid may be transamidated 
^generate Gln-tRNA. The gene coding for asparaginyl-tRNA 
rmthetase, asnS, is also missing from the R. prowazekii genome as 
" as from Helicobacter pylori, C. trachomatis and A. aeolicus 26 . A 
f— dation process to form Asn-tRNA* 5 " from Asp-tRNA** 1 
t been proposed for the archaeon Haloferax volcanii 27 and this 
ction may also occur in R. prowazekiL The valyl-tRNA synthetase 
58.3% identical to its homologue in Methanococcus jannaschii, 
tt-pnly 27.6% identical to its most similar homologue in bacteria, 
uch is found in Bacillus stearothermophilus, possibly indicating a 
irizontal transfer event. The lysyl-tRNA synthetase (encoded by 
I in R prowazekii is a class I enzyme with no resemblance to the 
^JSyentional class II lysyl-tRNA synthetases. Class I type of lysyl- 
fjRNA synthetases have been observed previously in only 
- ^y r Zdorferi, Pyrococcus woesii, Methanococcus jannaschii and a 
Other methanogens 26 . 



jRftJjulatory systems 

mSSP otner 6 en omes of small parasites, R. prowazekii has a reduced 
regulatory genes. There are a few members of two-component 
jKWatory systems, such as the proteins encoded by barA, envZ, 
j^^wfrX, ompR and phoR. spoT, which is involved in the stringent 
nse, has been identified in B. burgdorferi 9 Helicobacter pylori 
!g|j^f- genitalium. Only remnants of genes coding for amino- 
1 fragments of proteins similar to those encoded by spoTand 
identifiable in R prowazekii. No fragments of spoTencoding 
boxy-terminal segments of the protein have been identified 
;genome. 

[vision and protein secretion 

^involved in detoxification, such as superoxide dismutase, 
9se involved in thiophen and furan oxidation are present in R 



prowazekii Two genes encoding haemolysins have also been identi- 
fied, and an R. typhi homologue of tlyC exhibits haemolytic 
activities when expressed in E. coli (S. Radulovic, J. M. Troyer, 
B. Noden, S.G.E.A. and A. Azad, unpublished observations). 

The data indicate that the basic mechanisms of cell division and 
secretion in R. prowazekii are similar to those in free-living proteo- 
bacteria. There is a common set of bacterial chaperones (encoded by 
dnaK, dnaj, hslU, hslV, groEL, groEL, groES and fifpG).and genes 
involved in the secA-dependent secretory system (secABDEFGY,ffH 
and ftsY). R prowazekii has a" significantly larger set of genes 
involved in peptide secretion than does M. genitalium. 

Membrane-protein analysis 

Many studies of R prowazekii have focused on outer-surface 
membrane proteins because of their potential importance in bac- 
terial detection and vaccination. The superficial lipopolysaccharide 
(LPS) molecule is important in the pathogenesis of R, prowazekii. 
LPS consists of a polysaccharide that is covalendy linked to lipid A, 
the biosynthesis of which is catalysed by products of IpxABCD, all of 
which are present in the R prowazekii genome. These genes are 
clustered in K coli, but IpxA and IpxD are separate from IpxB and 
IpxC in R prowazekii. Three genes involved in the biosynthesis of the 
3-deoxy-D-manno-octulosonic acid (KDO) residues reside in the R. 
prowazekii genome (kdsA, kdsB arid kdtA). Only one gene (rfaj) with 
a putative function in outer-core biosynthesis has been identified. 

We have identified a set of genes involved in the biosynthesis of 
murein and diaminopimelate and a set involved in the biosynthesis 
of fatty acids. These includes: fabD, which is involved in the last step 
of the initiation phase of fatty-acid biosynthesis; four genes involved 
in the elongation cycle of fatty-acid biosynthesis [fabFGHT)', and 
three genes involved in the first three steps of the synthesis of polar 
head groups (cdsA, pssA and pgsA). Finally, post-translational 
processing and addition of lipids to an N-terminal cysteine require 
the gene products prolipoprotein diacylglycerol transferase (Igt), 
prolipoprotein signal peptidase (IspA) and apolipoproteimphosho- 
lipid N-acyl transferase (Int). These are found in the genome with 
several genes involved in the degradation of fatty acids, such asfadA 
which encodes the 3-ketoacyl-CoA thiolase. 

Virulence 

The R prowazekii genome contains several homologues of the VirB 
gene operon found in Agrobacterium tumefaciens. This gene family 
encodes proteins that direct the export of the T-DNA-protein 
complex across the bacterial envelope to the plant nuclei 28 . R 
prowazekii has two homologues of YtrB4 and one homologue 
each of VirBS, VtrB9, VtrBlO, VirBll and VirD4. The latter five 
genes are clustered with the gene trbG, which is involved in 
conjugation in Agrobactrium tumefaciens. Homologues of the 
single-stranded DNA-binding proteins VirD2 and VirE2 are miss- 
ing. In Agrobacterium tumefaciens, these proteins are bound to the 
transferred T-DNA, indicating different functions for the homo- 
logues of the VirB genes in R prowazekii. Indeed, VirB proteins are 
homologous to components of the E. coli transport system for 
plasmids, as well as to components of the Ptl transport machinery 
in Bordetella pertussis, which exports pertussis toxin 28 . A set of genes 
coding for VirB4 and several other VirB proteins has been identified 
in the cag pathogenicity island of Helicobacter pylori. In this species, 
the VirB proteins facilitate export of a factor that induces inter- 
leukin-8 secretion in gastric epithelial cells 28 . Thus, & prowazekii 
may encode components of a transport system for both conjugal 
DNA transfer and protein export. 

The virulence of Staphylococcus aureus has been correlated with 
the production of capsular polysaccharides in phagocytic assays and 
mouse lethality assays 29 * 30 . A cluster of ten capsule genes (capA-M) 
is involved in capsule biosynthesis in 5. aureus strain M 3 We have 
identified three R. prowazekii genes with sequence similarities to S. 
aureus cap genes. Two of these (capD and capM) are separated by ten 
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genes, most of which are unknown genes or genes involved in the 
biosynthesis of LPS or techoic acid. Thus, R. prowazekii may 
produce components of a microcapsular layer that is involved in 
virulence. 

Reductive evolution 

Genome sequences of organisms enjoying an endosymbiotic life- 
style are at risk. The activities of homologous nuclear genes may 
render genes of the endosymbiont expendable and as a consequence 
they become vulnerable to obliteration by mutation. Good candi- 
dates for such purged genes in Rickettsia and mitochondria are 
genes required for amino-acid biosynthesis, nucleoside biosynthesis 
and anaerobic glycolyis. These and other genes would have been 
deleted when an ancestral genome first lived in a nucleated cell. 
Once genes essential to a free-living mode are lost, the endosym- 
biont becomes an obligate resident of its host. 

Likewise, small, bottle-necked populations of bacteria infecting a 
eukaryotic cell will tend to accumulate deleterious mutations 
because selection cannot remove them from such clonal 
populations 13 . The accumulation of such harmful but non-lethal 
mutations is referred to as 'Muller's ratchet' 32 or 'near-neutral 
evolution* 3 ' 34 . The consequence of accumulation of these mutations 
will be the inactivation and eventual deletion of non-essential genes. 

The first mutation that inactivates an expendable gene is likely to 
initiate a sequence of events in which subsequent mutations freely 
transform it, by degrees, from a pseudogene, to unrecognizable 
sequence, to small fragments, to extinction. In this sequence, 
mutations are released from amino-acid-coding constraints. Thus 
nucleotide substitutions will reflect the mutation bias of the 
genome. This bias can be estimated roughly by frequencies of 
third-position bases in the codons. For ft prowazekii, the bias of 
the third-position bases is 18% G+C rather than the 29% G+C 
average for the genome. So, as sequences age in ft prowazekii, their 
composition should gradually approach the low G+C content of 
third codon positions. Nearly one-quarter of the ft prowazekii 
genome is composed of non-coding sequences, with a G+C content 
lower than that of coding sequences (25% G+C compared to 30%; 
P < 0.001). Thus, much of the non-coding sequence may be 
remnants of coding sequences that are in the process of being 
eliminated from the genome. 

The gene encoding S-adenosylmethionine synthetase (metK) y 
which catalyses the biosynthesis of S-adenosylmethionine (SAM), 
illustrates the initiation of this process. The metK sequence in the 
strain of ft prowazekii studied here has a tennination codon within 
a region of the gene that is otherwise highly conserved among 



bacterial species 35 . However, a closely related strain does n 0 
the termination codon. Many other defects, such as ter 
codons, insertions, and a preponderance of small deletio 
also been observed in the metK genes in several members 
spotted fever group Rickettsia (J.O.A. and S.G.E.A., unpubj 
observations). This random distribution of lethal mutations 
some metK alleles from different Rickettsia species indicates 
gene may have just entered the extinction process. This distrib 
and the identification of 1 1 more pseudogenes for carboxypep 
(ypwA)> penicillin-binding protein {pbpQ, succinyl CoA-ti 
ase (scoB), transposase (tra3) 9 resolvase (pin), conjugative ; 
protein (toxB), a hypothetical protein iyfcl) and four 
fragmented open reading frames for (p)ppGpp 3'-pyrophosp" 
drolase, indicates that the ft prowazekii genome contin 
eliminate genes. 

Genome sequences can be purged by a more abrupt me 
This consists of intrachromasomal recombination at dup ,; 
sequences, which can result in the deletion of intervening seqii 
the loss of a sequence duplication and the rearrangement of fl 
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Figure 4 Histogram representation of the similarity of predicted ft prowazekii 
proteins to yeast proteins targeted to the mitochondria. Only protein pairs with per 
cent identity values greater than 25% are shown. Numbers in parentheses 
represent the total number of yeast mitochondrial proteins within each category. 
The yeast mitochondrial protein sequences have been taken from http:// 
www.proteome.com. 
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Figure 5 The organization and phylogenetic relationships of gene en<^ 
ribosomal protein from ft prowazekii and the mitochondrial genom 
Reclinomonas americana. a, The organization of ribosomat-protein genes* 
S 10. spc and a-operons are organized similarly in these two genomes, excefj 
several ribosomal-protein genes 38 have been deleted from the mitochoj 
genome of Redinomonas americana. b, The phylogenetic relationsh| 
mitochondria and bacteria were derived from the combined amine 
sequences of ribosomal proteins S2, S3. S7, S10, S11, S12, S13, S14, S19,| 
and L16. Neighbour-joining and maximum-parsimony methods gave ids 
topologies. Branch lengths are proportional to those reconstructed by usj 
neighbour-joining method. Values at nodes are bootstrap values indical 
degree of support for individual clusters under each method (neighbour- 
maximum parsimony). Only bootstrap values >90 £ )b are shown. 
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tes. Such a mechanism will account for the presence in ft 
: of one, unlinked copy of rrs and rr/, both of which are 
finded by new flanking sequences 36 . Likewise, ft prowazekii has 
*if gene and one fus gene in atypical clusters that seem to have 
'eated by intrachromosomal recombination between the two 
-es that are normally found in Gram-negative bacteria 37 . 
" rearranged gene operon structures encoding ribosomal 
efris are characteristic of all members of the genus Rickettsia 

xiy C.A. and S.G.E.A., unpublished observations), 
nserved operons that are found in free-living bacteria are often 
throughout the Rickettsia genome (see above). The ft 
^rekii genome contains an unusually small fraction of repeat 
-ces (<10% of that observed in free-living bacteria). We 
".that the repeat sequences found in the ancestor to the 
titsia have been 'consumed* by the intrachromosomal-recom- 
; pri mechanism that generated some of the deletions and 
gements seen in ft prowazekii. Such intrachromosomal 
hibinants arise at a substantial rate in bacteria growing in 
-e, but here they are eliminated from the populations by 
*on. That such remnants of intrachromosomal recombination 
Itained in ft prowazekii indicates that purifying selection has 
^attenuated in this organism. 

ondrial affinities 

Reduction in genome size in mitochondria and Rickettsia is 
* to have occurred independently in the two lineages. Most of 
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the genes supporting mitochondrial activities are nuclear. Many of 
the 300 proteins encoded in the nucleus of the yeast Saccharomyces 
cerevisiae but destined for service within the mitochondrion are 
close homologues of their counterparts in R. prowazekii Nearly 
one-quarter of these proteins are required for bioenergetic processes 
and another one- third of them are required for the expression of the 
genes encoded in the mitochondrial genome. In total, more than 
150 nucleus-encoded mitochondrial proteins share significant 
sequence homology with ft prowazekii proteins (Fig. 4). 

Another group of 58 nucleus-encoded mitochondrial proteins 
represents components of the mitochondrial transport machinery 
and regulatory system (Fig. 4). These include proteins found in the 
mitochondrial outer membrane and others involved in splicing 
reactions. Such proteins have probably been secondarily recruited to 
mitochondria from genomes not necessarily related to that of the ot- 
proteobacterial ancestor. 

The mitochondrial genome of the early diverging, freshwater 
protozoan Reclinomonas americana is more like that of a bacterium 
than any other mitochondrial genome sequenced so far 38 . This 
genome contains 67 protein-coding genes, most of which provide 
components of genetic processes and the bioenergetic system 38 . 
Several gene clusters in this mitochondrial genome are reminiscent 
of those in bacteria (Figs 5a, 6a). Most similarities represent 
retained, ancestral traits present in the common ancestor of bacteria 
and mitochondria. For example, the genes rplKAJL and rpoBC are 
identically organized in ft prowazekii and the mitochondrial 
genome of Reclinomonas americana. Likewise, the genes encoding 
the S10, spc and the a-ribosomal protein operons are organized 
similarly in the two genomes. The immediate proximity of these two 
clusters in the Reclinomonas americana mitochondrial DNA is 
reminiscent of the arrangement in free-living bacteria, whereas 
the physical separation of the two clusters in the ft prowazekii 
genome is atypical. A further rearrangement event is indicated by 
the feet that the rpsLrpsGfus cluster is located upstream of the 
rplKAJLrpoBC duster in ft prowazekii, rather than downstream as it 
is in the Reclinomonas americana mtDNA. Phylogenetic reconstruc- 
tions based on ribosomal proteins within each of these two clusters 
indicate that there is a close evolutionary relationship between ft 
prowazekii and mitochondria (Fig. 5b). 

Mitochondria and ft prowazekii have a similar repertoire of 
proteins involved in ATP production and transport, including 
genes encoding components of the TCA cycle, the respiratory- 
chain complexes, the ATP-synthase complexes and the ATP/ AD P 
translocases. There are some similarities in the gene orders of some 
functional clusters (Fig. 6a). There are also some rearrangements of 
clusters that are specific to Rickettsia. One example is the inversion 
of segments corresponding to nuoJKLM and nuoGHL Another is the 
scattered displacement of genes involved in the biogenesis of 
cytochrome c. Nevertheless, phylogenetic reconstructions based 
on components of the NADH dehydrogenase complexes indicate 
that there is a close evolutionary relationship between ft prowazekii 
and mitochondria (Fig. 6b). 

We have identified as many as five genes coding for ATP/ADP 
transporters, all of which are expressed (R.M.P. et al, unpublished 
observations). The Rickettsia ATP/ADP translocases are monomers 
with 12 transmembrane regions each, whereas the mitochondrial 
translocates are dimers with six transmembrane regions per dimer. 
We found no relationship between the primary structures of the 
mitochondrial and Rickettsia ATP/ADP translocases, indicating that 
these transport systems may have originated independently. 

The study of the ft prowazekii genome sequence supports the idea 
that aerobic respiration in eukaryotes originated from an ancestor 
of the Rickettsia^ as indicated previously by phylogenetic reconstruc- 
tions based on the rRNA gene sequences 7 * 9 . Phylogenetic analyses of 
the petB and coxA genes indicate that the respiration systems of 
Rickettsia and mitochondria diverged —1,500-2,000 million years 
ago 10 , shortly after the amount of oxygen in the atmosphere began 



to increase. The finding that the ATP/ADP translocases in R. 
prowazekii and mitochondria are of different evolutionary origin 
is problematic (R.M.P. et al, unpublished observations). Free-living 
bacteria do not seem to have homologues of ATP/ADP translocases, 
which are found only in organelles and in two obligate intracellular 
parasites, Rickettsia and Chlamydia, Thus it is not known whether 
the original endosymbiont was capable of efficient exchange of 
adenosine nucleotides with its host cell. More detailed comparative 
analysis of the genomes of ct-proteobacteria may refine our under- 
standing of the origins of mitochondria. d 

Methods 

Genome sequencing. We prepared genomic DNA from the Madrid E strain of 
R. prowazekii, which was originally isolated in Madrid from a patient who died 
in 1941 with epidemic typhus. We propagated R. prowazekii in the yolk sacof 
embryonated hen eggs and purified DNA according to standard procedures 39 . 
We sequenced the R. prowazekii genome by a whole-genome shotgun approach 
combination with shotgun sequencing of a selected set of clones from a 
cosmid library (A.Z. et al, unpublished observations). Genomic and cosmid 
DNA was sheared by nebulization to an average size of ~2kb. The random 
fragments were cloned into a modified M13 vector using the double adaptor 
method 40 . We collected 19,078 sequence reads during the random sequencing 
phase using Applied Biosystems 377 DNA sequencers (Perkin-Elmer). 

The sequences were assembled and the consensus sequence was edited using 
the STADEN program 41 . We verified the structure of the assembled sequence by 
end-sequencing of 3-kb-insert X Zap II clones 36 , 10-kb X clones and 30-kb 
cosmid clones. More than 97% of the genome was covered by clones from the 
three different libraries (A.Z. et al, unpublished observations). Gaps between 
contigs were closed by direct sequencing of clones from the three libraries or of 
polymerase chain reaction (PCR) products. The final four gaps were closed by 
direct sequencing of PCR products generated with the Long Range PCR system 
(Gene Amp). Regions of ambiguity were identified by visual inspection of the 
assembly and resequence^ The final assembly contains -20,000 sequences. 
The genome sequence has eightfold coverage on average and no single region 
has less than twofold coverage. We estimate the overall error frequency to be 
<1X10" 5 . 

Informatics. Sequence analysis and annotation was managed by CapDB (T.S.-P. 
et al, unpublished observations). We identified open reading frames of more 
than 50 codons as genes on the basis of their characteristic patterns in 
nucleotide-frequency statistics 14 using BioWish 42 . The identified genes were 
analysed using the program BLASTX 43 to search for sequence simQarities in 
EMBL, TREMBL, SwissProt and in-house databases. We identified tRNA genes 
with the program tRNA scan-SE 44 . Remaining frameshifts were considered to 
be authentic and annotated as pseudogenes. Families of paralogues were 
constructed using BLAST to search for sequence similarities within the R. 
prowazekii genome. Multiple alignments and phylogenetic trees for genes with 
significant sequence similarities to genes in the public databases were con- 
structed automatically using CLUSTAL-W 45 , Phylo.win 46 and GRS 47 . The final 
annotation was based on manual inspection of the phylogenetic placement of 
R. prowazekii in the resulting gene trees. 
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BLAST| Entrez | ? | 



WARNING: These microbial genomes from are not yet finished, and are not 
yet in GenBank and are not presently distributed to EMBL or DDBJ. 
Please see details 



NOTE: 



This WWW-BLAST page utilizes NCBI ' s new gapped BLAST algorithm 
( Altschul et al.. 1997 ) with the BLASTN, TBLASTN , and TBLASTX programs. 



Commencing search, please wait for results. 



You have searched a database generously provided by the Institute for Genomic Research 
(TIGR). Their Policy on Early Data Release is: 



The Institute for Genomic Research (TIGR) releases data very rapidly to ensure that our scientific colleagues have access to 
information that may assist them in the search for genes and their biological function. Data releases do not constitute scientific 
publication, but rather provide investigators with information that may "jump- start" biological experimentation. Users of this 
information are encouraged to share their results with TIGR in order to improve annotation of the sequence data. Data or 
information may contain errors or be incomplete and should be regarded as preliminary. 

TIGR asks that you acknowledge the source of information obtained from this site in any publication by including the following 
sentence in both the Materials and Methods and Acknowledgement sections: "Preliminary sequence data was obtained from The 
Institute for Genomic Research website at http://www.tigr.org " Also include the following text in the Acknowledgements, if 
applicable: "Sequencing of [organism name] was accomplished with support from [funding agency]." The name ofthe funding 
agency for each TIGR project can be found at http://www.tigr.org/tdb/mdb/mdb.html 

Similarly, if you display this data or any information derived from it on a Web page, we ask that you prominently display the 
following notice on that webpage: "Preliminary sequence data was obtained from The Institute for Genomic Research website at 
http://www.tigr.org" We request that you notify us of your electronic presentation by sending email to www@tigr.org. 



TBLASTN 2.0.8 [ Jan-05- 19 99 ] 



Reference : 

Altschul, Stephen F . , Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search 
programs", Nucleic Acids Res. 25:3389-3402. 

Query= 

(334 letters) 

Searching done 

If you have any problems or questions with the results of this search 
please refer to the BLAST FAQs 



Sequences producing significant alignments: 



Score E 
(bits) Value 



gb|U00096 | ECOLI Escherichia coli K-12 MG1655 complete genome 
gnl | Sanger | S. typhi_Contig369 Salmonella typhi unfinished fragmen. . . 
gnl j Sanger I Y.pesits_Contig315 Yersinia pestis unfinished fragmen... 
gnl|CBCUMN|Pmultocida.990513.Contig500 Pasteurella multocida PM7 . . . 
gnl|CBCUMN|PMultocida.99 0407.Contig485 Pasteurella multocida PM7 . . . 
gb | L42023 | L42023 Haemophilus influenzae Rd complete genome 
gnl |CBCUMN|F8P5 Pasteurella multocida PM70 unfinished fragment o... 
gnl |TIGR|V.cholerae_asm894 Vibrio cholerae unfinished fragment o... 
gnl j PAGP j Paeruginosa_Contig50 Pseudomonas aeruginosa unfinished ... 
gnl|OUACGT|A.actin_Contig398 Actinobacillus actinomycetemcomitan . . . 
gnl | TIGR | S .putref aciens_gsp_271 Shewanella putrefaciens unfinish. . . 
gnl | Sanger | N.mening_Contig4 Neisseria meningitidis serogroup A u... 
gnl |0UACGT|Ngon_Contigl91 Neisseria gonorrhoeae unfinished fragm. . . 
gnl j TIGR |D.radiodurans_8842 Deinococcus radiodurans unfinished f... 
gnl j Sanger | B.pertussis_Contig654 Bordetella pertussis unfinished... 
gnl |OUACGT|Ngon_Contig22 3 Neisseria gonorrhoeae unfinished fragm... 
gnl | Sanger | N.mening_Contig3 Neisseria meningitidis serogroup A u... 
emb | AL123456 | MTBH37RV Mycobacterium tuberculosis H37Rv complete ... 
gnl |TIGR|gmt3711 Mycobacterium tuberculosis unfinished fragment ... 
gnl j Sanger_1765 |mbovis_Contigl041 . 0 Mycobacterium bovis unfinish... 
gb| AE000657 | AE000657 Aquifex aeolicus complete genome 
gnl|Sanger|B.pertussis_Contig889 Bordetella pertussis unfinished... 
gnl | TIGR | t_f errooxidans_1986 Thiobacillus ferrooxidans unfinishe... 
gnl j TIGR |C.tepidum_gct_9 Chlorobium tepidum unfinished fragment ... 
gnl |TIGR|gef_6277 Enterococcus faecalis unfinished fragment of c... 
gnl j Sanger | Y.pesits_Contig790 Yersinia pestis unfinished fragmen... 
gnl j TIGR |C.trachomatis_ct_97 Chlamydia trachomatis MOPN unfinish... 
gnl | PAGP j Paeruginosa_Contig53 Pseudomonas aeruginosa unfinished ... 
gnl |TIGR|N.meningitidis_GNMCF18R Neisseria meningitidis MC58 unf... 
gnl |TIGR|T.maritima_tm_26 Thermotoga maritima unfinished fragmen... 
gnl j TIGR j P . gingivalis_1194 Porphyromonas gingivalis W83 unfinish... 
gnl|Sanger|S.typhi_Contig37 6 Salmonella typhi unfinished fragmen... 
gnl |OUACGT| Spyogenes_Contig243 Streptococcus pyogenes unfinished... 
gnl j TIGR | S. putref aciens_gsp_3 87 Shewanella putrefaciens unfinish... 
gnl | TIGR|M.avium_5593 Mycobacterium avium unfinished fragment of... 
embj AL00912 6 | BSUB Bacillus subtilis complete genome 

gnl |U0KN0R| S.mutans_Contig840 Streptococcus mutans unfinished fr. . . 
gb|AE001273 |AE001273 Chlamydia trachomatis complete genome 
gnl |TIGR|C.crescentus_gcc_764 Caulobacter crescentus unfinished ... 
gnl | Sanger | campylo_Cj .seq Campylobacter jejuni NCTC 11168 unfini... 
gnl |OUACGT| A . actin_Contig753 Actinobacillus actinomycetemcomitan. . . 
gnl | TIGR | gmt3732 Mycobacterium tuberculosis unfinished fragment ... 
gnl |GTC|C.aceto_AE001437 Clostridium acetobutylicum, WORKING DRA. . . 
gnl |TIGR|S.pneumoniae_sp_36 Streptococcus pneumoniae unfinished ... 
gnl j TIGR | V. cholerae_asm864 Vibrio cholerae unfinished fragment o. . . 
gnl|TIGR|P.gingivalis_1209 Porphyromonas gingivalis W83 unfinish... 
gnl j Sanger_1765 | mbovis_Contig454 . 1 Mycobacterium bovis unfinishe... 
gb| AE000520 | AE000520 Treponema pallidum complete genome 
gb|AE000511 | HPYL Helicobacter pylori 26695 complete genome 
gnl | TIGR | S . aureus_2202 Staphylococcus aureus COL unfinished frag, 
gnl |OUACGT| S.aureus_Contigll64 Staphylococcus aureus unfinished . 
gb|AE001439 |AE001439 Helicobacter pylori, strain J99 complete ge . 
gnl | TIGR |N.meningitidis_GNMAB03R Neisseria meningitidis MC58 unf. 
gnl | TIGR j S . pneumoniae_sp_68 Streptococcus pneumoniae unfinished . 
gnl j TIGR j C. tepidum_gct_35 Chlorobium tepidum unfinished fragment, 
gnl j TIGR j t_f errooxidans_64 Thiobacillus ferrooxidans unfinished . 
gnl |OUACGT|Ngon_Contigl96 Neisseria gonorrhoeae unfinished fragm. 
gnl j TIGR |D.radiodurans_8813 Deinococcus radiodurans unfinished f. 
gb|AE000783 |AE000783 Borrelia burgdorferi complete genome 
gnl | TIGR | t_f errooxidans_1967 Thiobacillus ferrooxidans unfinishe. 
gnl j TIGR j gef_6250 Enterococcus faecalis unfinished fragment of c. 



614 
490 
284 
175 
174 
153 
141 
123 
115 
115 
95 
89 
87 
76 
73 
73 
73 
69 
69 
_69 
68 
64 
_64 
64 
63 
63 
62 
62 
61 
61 
60 
59 
59 
59 
58 
58 
58 
58 
57 
57 
^6 
56 
55 
55 
54 
54 
53 
53 
52 
51 
51 
50 
50 
49 
48 
48 
47 
47 
47 
46 
45 



e-175 

e-138 

2e-76 

2e-43 

3e-43 

6e-37 

4e-33 

6e-28 

2e-25 

2e-25 

3e-19 

2e-17 

6e-17 

le-13 

le-12 

le-12 

le-12 

2e-ll 

2e-ll 

2e-ll 

6e-ll 

5e-10 

5e-10 

7e-10 

2e-09 

2e-09 

2e-09 

3e-09 

6e-09 

6e-09 

le-08 

2e-08 

2e-08 

2e-08 

4e-08 

4e-08 

4e-08 

5e-08 

9e-08 

le-07 

2e-07 

2e-07 

3e-07 

4e-07 

6e-07 

8e-07 

le-06 

le-06 

4e-06 

5e-06 

5e-06 

le-05 

le-05 

2e-05 

4e-05 

4e-05 

7e-05 

7e-05 

7e-05 

2e-04 

3e-04 



emb| AJ235269 I RPXXO Rickettsia prowazekii strain Madrid E, comple. 
gnl |OUACGT| Spyogenes_Contig260 Streptococcus pyogenes unfinished, 
gnl | OUACGT | Ngon_Contigl66 Neisseria gonorrhoeae unfinished fragm. 
gnl | UOKNOR | S ,mutans_Contig762 Streptococcus mutans unfinished fr. 
gnl j TIGR | C . trachomatis_ct_26 Chlamydia trachomatis MOPN unfinish. 
gnl | TIGR | S . aureus_2184 Staphylococcus aureus COL unfinished frag, 
gnl |CBCUMN| Pmultocida. 990513. Contig705 Pasteurella multocida PM7 . 
gb| AB001339 | SYNECHO Synechocystis PCC6803 complete genome 
gnl | TIGR |M. avium_5418 Mycobacterium avium unfinished fragment of. 
gnl | TIGR |C.crescentus_gcc_2 104 Caulobacter crescentus unfinished, 
gnl | TIGR | V. cholerae_asm959 Vibrio cholerae unfinished fragment o. 

gb|U00096 | ECOLI Escherichia coli K-12 MG1655 complete genome 
Length = 4639221 

Score = 614 bits (1566), Expect = e-175 

Identities = 296/334 (88%), Positives = 296/334 (88%) 

Frame = +3 
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re? ft 

U £* O 


T. Q 
J o 


A 
U 


C\ 1 1 


36 


0 


19 


36 


0 


25 


32 


2 


1 


32 


2 


8 


31 


8 


3 



Query : 1 MRWYPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCG 6 0 

MRWYPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCG 
Sbjct: 1154985 MRWYPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDAL I YALSRYLLCQQPQGHKSCG 1155164 

Query: 61 HCRGCQLMQAGTHPDYYTLAPEKGKNTLGVDAWEVTEKLNEHARLGGAKVVWVXXXXXX 12 0 

HCRGCQLMQAGTHPDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKWWV 
Sb j ct: 1155165 HCRGCQLMQAGTHPDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKWWVTDAALL 1155344 

Query: 121 XXXXXXXXXXXXEEPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWLSREV 180 

EEPPAETWFFLATREPERLLATLRSRCRLHYLA PPEQYAVTWLSREV 
Sbjct: 1155345 TDAAANALLKTLEEPPAETWFFLATREPERLLATLRSRCRLHYLAP PPEQYAVTWLSREV 1155524 

Query: 181 TMSQDXXXXXXXXXXXXXXXXXXXFQGDNWQARETLCQALAYSVPSGDWYSLLAALNHEQ 240 

TMSQD FQGDNWQARETLCQALAYSVPSGDWYSLLAALNHEQ 
Sbjct: 1155525 TMSQDALLAALRLSAGSPGAALALFQGDNWQARETLCQALAYSVPSGDWYSLLAALNHEQ 1155704 

Query : 241 APARLHWLATLLMDALKRHHGAAQVTNVDVPGLVAELANHLSPSRLQAILGDVCHIREQL 300 

APARLHWLATLLMDALKRHHGAAQVTNVDVPGLVAELANHLSPSRLQAILGDVCHIREQL 
Sbjct: 1155705 APARLHWLATLLMDALKRHHGAAQVTNVDVPGLVAELANHLSPSRLQAILGDVCHIREQL 1155884 

Query: 3 01 MSVTGINRELLITDLLLRIEHYLQPGWLPVPHL 334 

MSVTGINRELLITDLLLRIEHYLQPGWLPVPHL 
Sbjct: 1155885 MSVTGINRELLITDLLLRIEHYLQPGWLPVPHL 1155986 



Score 


= 57.4 


bits (136), Expect = 7e-08 




Identities = 


37/144 (25%), Positives = 59/144 (40%) 




Frame 


= +3 






Query: 


21 


GRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLA 


80 






GR HHA L G+G ++ L++ L C+ CG C C+ ++ G D + 




Sbjct: 


491418 


GRIHHAYLFSGTRGVGKTSIARLLAKGLNCETGITATPCGVCDNCREIEQGRFVDLIEI- 


491594 


Query: 


81 


PEKGKNTLGVI)AVREVTEKLNEHARLGGAKvVWVXXXXXXXXXXXXXXXXXXEEPPAETW 


140 






+ V+ R++ + + G KV + EEPP 




Sbjct : 


491595 


--DAASRTKVEDTRDLLD3WQYAPARGRFKVYLIDEVHMLSRHSFNALLKTLEEPPEHVK 


491768 


Query : 


141 


FFLATREPERLLATLRSRCRLHYL 164 








F LAT +P++L T+ SRC +L 




Sbjct: 


491769 


FLLATTDPQKLPVTILSRCLQFHL 491840 





gnl | Sanger | S . typhi_Contig369 Salmonella typhi unfinished fragment of complete genome 
Length = 5674 

Score = 490 bits (1248), Expect = e-138 

Identities = 229/334 (68%), Positives = 262/334 (77%) 

Frame = -1 

Query : 1 MRWYPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCG 60 

M+WYPWLRP +EKLV S YQAGRGHHALL I QALPGMGD+ AL YALSRYLLCQQP+GHKSCG 
Sbjct: 2329 MKWYPWLRPAYEKLVES YQAGRGHHALL I QALPGMGDEALCYALSRYLLCQQPEGHKSCG 2150 

Query: 61 HCRGCQLMQAGTHPDYYTLAPEKGKNTLGVDAWEVTEKLNEHARLGG 120 

HCRGCQLMQAGTHPDYYTL P+KGK++LGVDAVREV+EKL EH + RLGGAKWW+ 
Sbjct: 2149 HCRGCQLMQAGTHPDYYTLTPDKGKSSLGVT3AWEVSEKLYEHSRLGGAKVVWIADAALL 1970 

Query: 121 XXXXXXXXXXXXEEPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWLSREV 180 

EEPP +TWFFLA+ EP RLLATLRSRCRLH+LA P E YA++WLSREV 
Sbjct: 1969 TDAAANALLKTLEEPPEQTWFFLASPEPARLLATLRSRCRLHHLAPPSESYAMSWLSREV 1790 

Query: 181 TMSQDXXXXXXXXXXXXXXXXXXXFQGDNWQARETLCQALAYSVPSGDWYSLLAALNHEQ 240 

T SQ+ Q + W RE LCQAL S+ +GDWY+LL ALNHEQ 

Sbjct: 1789 TASQEALLTALRLNAGSPGAALALLQSERWAQREALCQALMDSLHTGDWYALLTALNHEQ 1610 

Query : 241 APARLHWLATLLMDALKRHHGAAQVTNVDVPGLVAELANHLSPSRLQAILGDVCHIREQL 300 

APARLHWLATLL+DALKR HGA+ +TNVD +VA LA LSP+R+QAIL DVCH R+QL 
Sbjct: 1609 APARLHWLATLLVDALKRQHGASYLTNVDADAWAALAGPLSPARIQAILNDVCHCRDQL 1430 

Query: 301 MSVTGINRELLITDLLLRIEHYLQPGWLPVPHL 334 

+ VTG+NREL++TDL+LRIEHYLQPG +L VPHL 
Sbjct: 1429 LHVTGLNRELVLTDLI LRI EHYLQPGTLLXVPHL 1328 



gnl | Sanger | Y.pesits_Contig315 Yersinia pestis unfinished fragment of complete genome 
Length = 20197 

Score = 284 bits (720), Expect = 2e-76 

Identities = 147/334 (44%), Positives = 192/334 (57%), Gaps = 6/334 (1%) 
Frame = -1 

Query : 1 MRWYPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCG 6 0 

M WYPWL + +LV + GRGHHALL+ +LPG G+DALIYALSR+L+CQQ QG KSCG 
Sbjct: 15274 MNWYPWLNAPYRQLVGQHSTGRGHfiALLLHSLPGNGEDALIYALSRWLMCQQRQGEKSCG 15095 

Query: 61 HCRGCQLMQAGTHPDYYTLAPEKGKNTLGVDAVREVTEK^ 12 0 

C C+LM AG HPD+Y L PEKGK+ ++GV+ VR++ +KL HA+ GGAKWW+ 
Sbjct: 15094 ECHSCRLMLAGNHPDWYVLTPEKGKSSIGVELVRQLIDKLYSHAQQGGAKWWLPHAEVL 14915 

Query: 121 XXXXXXXXXXXXEEPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWLS 177 

EEPP +T+F L +P LLATLRSRC YLA P + WL+ 

Sbjct: 14914 TDAAANALLKTLEEPPEKTYFLLDCHQPASLLATLRSRCFYWYLACPDTAICLQWLNLQW 14735 

Query: 17 8 --REVTMSQDXXXXXXXXXXXXXXXXXXXFQGDNWQARETLCQALAYSVPSGDWYSLLAA 2 35 

R++ + Q + WRLCL++D SLL 

Sbjct: 14734 RKRQIPVEPVAMLAALKLSEGAPLAAERLLQPERWSIRSALCSGLREALNRSDLLSLLPQ 14555 

Query : 236 LNHEQAPARLHWLATLLMDALKRHHGAAQ- VTNVDVPGLVAELANHLSPSRLQAILGDVC 294 

LNH+ A RL WL++LL+DALK GA + N D LV +LA+ + L + + 
Sbjct: 14554 LNHDDAAERLQWLSSLLLDALKWQQGAGEFAVNQDQLPLVQQLAHIAATPVLLQLAKQLA 14375 



Query: 295 HIREQLMSVTGINRELLITDLLLRIEHYLQPGWLPVPHL 334 
H R QL+SV G+NRELL+T+ LL E L G +P L 



Sbjct: 14374 HCRHQLLSWGVNRELLLTEQLLSWETALSTGTYSTLPSL 14255 



gnl |CBCUMN| Pmultocida. 990513 .Contig500 Pasteurella multocida PM70 unfinished fragment of 
Length = 1241 

Score = 175 bits (439), Expect = 2e-43 

Identities = 102/319 (31%), Positives = 151/319 (46%), Gaps = 4/319 (1%) 
Frame = -1 

Query: 1 MRWYPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCG 60 

M YPWL P + + + + ++Q G GHHALL QA G+ + L++AL +L+CQQPQ + C 
Sbjct: 1196 MTLYPWLLPYYQQRIDAFQQGHGHHALLFQAEQGLSTEQLLFALGHWLICQQPQNQQPCQ 1017 

Query : 6 1 HCRGCQLMQAGTHPDYYTLAPEKGKNTLGVDAWEVTEKLNEHARLGGAKVVWVIOOCXXX 120 

C C L QA THPD YTL P + K+ +GVD VREV EK+N+HA+ GG K+ + +V 
Sbjct: 1016 QCHHCHLFQAQTHPDI YTLTPIENKD- IGVDQVREVNEKINQHAQQGGNKI I YVLGVSRL 840 

Query: 121 XXXXXXXXXXXXEEPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWLSREV 180 

EEP T+F L T + ++ T+ SRC+ LA P E A+ WL + + 
Sbjct : 839 TEAAANAMLKTLEEPRPNTYFLLYTEASDSVMPTIYSRCQTQKLALPAETSAIAWLQQQT 660 

Query: 181 TMSQDXXXXXXXXXXXXXXXXXXXFQGDNWQARETLCQALAYSVPSGDWYSLLAALNHEQ 240 

T QD+R+ LL + 

Sbjct : 659 TQEIAAIQTALRISYGRPLHALTVLQQDLLEKRREFLRQFWLFYRKRSPLELLPFFDKAI 480 

Query: 241 APARLHWLATLLMDALKRHHGAAQVTN VDVPGLVAELANHLSPSRLQAILGDVCHI 296 

+L WL L DALK Q+ + D+ V +L+ S L + + 

Sbjct : 479 LLHQLDWLLAFLSDALK AKLQIKSDWLCQDLAAGVLQLSQQQSAQALLHATQIIQKV 309 

Query: 297 REQLMSVTGINRELLITDLLLRI 319 

R L + +N+EL++ D L ++ 
Sbjct: 308 RTDLTQINAVNQELILLDGLTQL 240 



gnl |CBCUMN| PMultocida . 990407 . Contig485 Pasteurella multocida PM70 unfinished fragment of 
Length = 1370 

Score = 174 bits (437), Expect = 3e-43 

Identities = 101/316 (31%), Positives = 150/316 (46%), Gaps = 4/316 (1%) 
Frame = -3 

Query : 4 YPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCR 63 

YPWL P + + + + ++Q G GHHALL QA G+ + L++AL +L+CQQPQ + C C 
Sbjct: 1218 YPWLLPYYQQRIDAFQQGHGHHALLFQAEQGLSTEQLLFALGHWLICQQPQNQQPCQQCH 1039 

Query : 6 4 GCQLMQAGTHPDYYTLAPEKGKNTLGVDAVREVTEKLNEHARL 123 

C L QA THPD YTL P + K+ +GVD VREV EK+N+HA+ GG K+++V 
Sbjct: 1038 HCHLFQAQTHPDI YTLTPIENKD- IGVDQVREVNEKINQHAQQGGNKI I YVLGVSRLTEA 862 

Query: 124 XXXXXXXXXEEPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWLSREVTMS 183 

EEP T+F L T + ++ T+ SRC+ LA P E A+ WL ++ T 
Sbjct : 861 AANAMLKTLEEPRPNTYFLLYTEASDSVMPTIYSRCQTQKLALPAETSAIAWLQQQTTQE 682 

Query: 184 QDXXXXXXXXXXXXXXXXXXXFQGDNWQARETLCQALAYSVPSGDWYSLLAALNHEQAPA 243 

Q D + R + LL + 

Sbjct : 681 IAAIQTALRISYGRPLHALTVLQQDLLEKRREFLRQFWLFYRKRSPLELLPFFDKAILLH 502 

Query: 244 RLHWLATLLMDALKRHHGAAQVTN VDVPGLVAELANHLSPSRLQAILGDVCHIREQ 299 

+L WL L DALK Q+ + D+ V +L+ S L + +R 

Sbjct: 501 QLDWLLAFLSDALK AKLQIKSDWLCQDLAAGVLQLSQQQSAQALLHATQIIQKVRTD 331 



Query: 300 LMSVTGINRELLITDLLLRI 319 

L + +N+EL++ D L ++ 
Sbjct: 330 LTQINAVNQELILLDGLTQL 271 



gb | L42023 | L42023 Haemophilus influenzae Rd complete genome 
Length = 1830138 

Score = 153 bits (384) , Expect = 6e-37 

Identities = 97/316 (30%), Positives = 150/316 (46%), Gaps = 7/316 (2%) 
Frame = -2 

Query : 4 YPWLRPDFEKLVAS YQAGRGHHALLIQALPGMGDDALT YALSRYLLCQQPQGHKSCGHCR 6 3 

YPWL P + + + ++ G GHHA+LI+A G+G ++L AL++ + +C QG K CG C 
Sbjct: 477329 YPWLMPIYHQIAQTFDEGLGHHAVLIKADSGLGVESLFNALAQKIMCVA-QGDKPCGQCH 477153 

Query : 6 4 GCQLMQAGTHPDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKVVWVXXXXXXXXX 12 3 

C LMQA +HPDY+ L+P GK+ +GVD VR++ E + +HA+ G KW+V 
Sbjct: 477152 SCHLMQAHSHPDYHELSPINGKD-IGVDQVRDINEMVAQHAQQNGNKWYVQGAERLTEA 476976 

Query : 124 XXXXXXXXXEEPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWLSREVTMS 183 

EEP T+F L LLAT+ SRC+ + L+ P E+ A WL + + 

Sbjct: 476975 AANALLKTLEEPRPNTYFLLQADSSASLLATIYSRCQVWNLSVPNEEIAFEWLKSKSAVE 476796 

Query: 184 QDXXXXXXXXXXXXXXXXXXXFQGDNWQARETLCQALAYSVPSGDWYSLLAALNHEQAPA 2 43 

Q + R+ + LL + E+ 

Sbjct: 476795 NQEILTALAMNLGRPLLALETLQEGFIEQRKNFLRQFWVFYRRRSPLELLPLFDKERYVQ 476616 

Query: 244 RLHWLATLLMDALKRHHGAAQVTNVDVPGLVAELANHLS P - SRLQAI LG DVCHI 296 

+ + W+ L D LK +D VA+L + S Q LG + + 

Sbjct: 476615 QVDWILAFLSDCLKHK LEIDSHRQVADLGRGIEQFSDEQTALGLLQAIKIMQKV 476454 

Query: 297 REQLMSVTGINRELLITDLLLRI 319 

R L+++ G+N EL+ + D L R+ 
Sbjct: 476453 RSDLLTINGVNVELMLLDGLTRL 476385 



Score =56.6 bits (134), Expect = le-07 

Identities = 36/143 (25%), Positives = 57/143 (39%) 

Frame = -2 

Query : 2 2 RGHHALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAP 8 1 

R HHA L G+G + + ++ L C CG C C+ ++ G D + 

Sbjct: 1299740 RLHHAYLFSGTRGVGKTSIARLFAKGLNCVHGVTATPCGECENCKAIEQGNFIDLIEI-- 1299567 

Query: 82 EKGKNTLGVDAWEVTEKLNEHARLGGAKVVWXXXXXXXXXXXXXXXXXXEEPPAETWF 141 

+ V+ RE+ + + +G KV + EEPP F 

Sbjct: 1299566 -DAASRTKVEDTRELLDNVQYKPWGRFKVYLIDEVHMLSRHSFNALLKTLEEPPEYVKF 1299390 

Query : 142 FLATREPERLLATLRSRCRLHYL 164 

LAT +P++L T+ SRC +L 
Sbjct: 1299389 LLATTDPQKLPVTILSRCLQFHL 1299321 



gnl |CBCUMN|F8P5 Pasteurella multocida PM70 unfinished fragment of complete genome 
Length =550 

Score = 141 bits (351), Expect = 4e-33 

Identities = 64/149 (42%), Positives = 90/149 (59%) 

Frame = +3 



Query: 4 YPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCR 63 

YPWL P +++ + ++Q G GHHALL QA G+G + L++AL +L+CQQPQ + C C 
Sbjct: 9 YPWLLPYYQQRIDAFQQGHGHHALLFQAEQGLGTEQLLFALGHWLICQQPQNQQPCQQCH 188 

Query: 64 GCQLMQAGTHPDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKVVWVXXXXXXXXX 123 

C L QA THPD YTL P + K+ +GVD VREV EK+N+HA+ GG K+ + +V 
Sbjct: 189 HCHLFQAQTHPDIYTLTPIENKD-IGVDQVREVNEKINQHAQQGGNKIIYVLGVSRLTEA 365 

Query: 124 XXXXXXXXXEEPPAETWFFLATREPERLL 152 

EEP T+F L T + ++ 
Sbjct: 366 AANAMLKTLEEPRPNTYFLLYTEASDSVM 452 



gnl |TIGR| V. cholerae_asm894 Vibrio cholerae unfinished fragment of complete genome 
Length = 19711 

Score = 123 bits (307), Expect = 6e-28 

Identities = 90/313 (28%), Positives = 136/313 (42%), Gaps = 3/313 (0%) 
Frame = -1 

Query : 4 YPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCGHCR 6 3 

YPWL P + + A AG+ A LIQA G+G ++L+ ++R L+C Q + CG C 
Sbjct: 18034 YPWLVPVWQPWQAGLAAGKISSATLIQASEGVGVESLVELMARTLMCTSSQS-EPCGFCH 17858 

Query : 6 4 GCQLMQAGTHPDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKWWVXXXXXXXXX 123 

C LMQ+G HPD+ + + PEK + + V+ +R++ E + + L G + + + + 

Sbjct: 17857 SCGLMQSGNHPDFHWKPEKIGKSITVEQIRQMNRIAQESSQLSGYRLIVIEPADAMNES 17678 

Query: 124 XXXXXXXXXEEPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWLSREVTMS 183 

EEP F L T + LL T+ SRC+ LP V WL + + 

Sbjct: 17677 SANALLKTLEEPAPNCLFILVTSRIKHLLPTIVSRCQRLVLPAPTTALWEWLKGQGITT 17498 

Query: 184 QDXXXXXXXXXXXXXXXXXXXFQGDNWQARET-LCQALAYSVPSGDWYSLL--AALNHEQ 2 40 

+ + E+ L AL SGD + L AL 

Sbjct: 17497 PAYALHLCADSPLKTRAFMLEGGAEKYHELESQLMNAL SGDVNAQLKCI ALIDAD 17333 

Query : 241 APARLHWLATLLMDALKRHHGAAQVTNVDVPGLVAELANHLSPSRLQAILGDVCHIREQL 3 00 

L+W+ +L DA K H G Q P A LA + S+L + + EQL 

Sbjct: 17332 LTTHLYWVWCVLTDAQKIHFGVQQDY YPPASAALAGRFTYSKLHVQTASLERLMEQL 17162 

Query: 301 MSVTGINRELLITDLL 316 

+G+N ELL+ L 
Sbjct: 17161 NQFSGLNTELLLLQWL 17114 



gnl | PAGP| Paeruginosa_Contig50 Pseudomonas aeruginosa unfinished fragment of complete 
Length = 798876 

Score = 115 bits (286), Expect = 2e-25 

Identities = 84/323 (26%), Positives = 139/323 (43%), Gaps = 11/323 (3%) 
Frame = +2 

Query: 4 YPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCGHCR 63 

YPW + + +L Q HA L+ G+G AL + LLCQ+P +CG C + 

Sbjct: 521618 YPWQQALWSQLGGRAQHA HAYLLYGPAGIGKRALAEHWAAQLLCQRPAAAGACGECK 521788 

Query : 6 4 GCQLMQAGTHPDYYTLAPEKGKNTLGVDAV^EVTEKLNEHARLGGAKVVWVXXXXXXXXX 12 3 

CQL+ AGTHPDY+ L PE+ + + VD VR++ + + A+LGG KW + 
Sbjct: 521789 ACQLLAAGTHPDYFVLEPEEAEKPIRVDQVI^DLVGFWQTAQLGGRKWLLEPAEAMNVN 521968 



Query: 124 XXXXXXXXXEEPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWLSREVTMS 183 

EEP +T L + +P RLL T++SRC P + + WL+R + 

Sbjct: 521969 AANALLKSLEEPSGDTVLLLISHQPSRLLPTIKSRCVQQACPLPGAAASLEWLARALPDE 522148 



Query : 


1 fiA 
JL 0 *± 




0 ~K A 
Z 3 <± 










Sbjct : 


522149 


PAEALEELLALSGGSPLTAQRLHGQGVREQRAQWEGVKKLLKQQIAASPLAESW 


522313 


Query : 


235 


ALNHEQAP ARLHWLATL LMDALKRH - - HGAAQVTNVDVPGLVAE LANHL SPSRLQAI LGD 


292 






N P W + L+ H + D+ ++ L + + + + A+ 




Sbjct: 


522314 


--NSVPLPLLFDWFCDWTLGILRYQLTHDEEGLGLADMRKVIQYLGDKSGQAKVLAMQDW 


522487 


Query: 


293 


VCHIREQLMSVTGINRELLITDLLLRIEHYLQPG 326 








+ R+++++ +NR LL+ LL++ PG 




Sbjct : 


522488 


LLQQRQKVLNKANLNRVLLLEALLVQWASLPGPG 522589 





gnl | OUACGT | A.actin_Contig398 Actinobacillus actinomycetemcomitans unfinished fragment 
genome 

Length = 1469 
Score = 115 bits (285) # Expect = 2e-25 

Identities = 88/293 (30%), Positives = 136/293 (46%), Gaps = 4/293 (1%) 
Frame = -1 

Query : 27 LLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKGKN 86 

LLI+A G+G + L L++ L+C P+ + CG C C LMQA +HPD+ +AP + K+ 
Sbjct: 1469 LLIRADEGLGAEQLCRLLAQRLMCLTPKSAEPCGECHACHLMQANSHPDFQHIAPIENKD 1290 

Query: 87 TLGVDAVREVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXXXXXEEPPAETWFFLATR 146 

+GVD +R + E+ ++HA+ G KV+++ EEP T+F L 

Sbjct: 1289 -IGVDQIRAMNEQASQHAQQNGNKVIYIEQAHRLTESAANAILKTLEEPRPNTYFILQND 1113 

Query: 147 EPERLLATLRSRCRLHYLAGPPEQYAVTWLSREVTM-SQDXXXXXXXXXXXXXXXXXXXF 205 

+ LL T+ SRC + + LP A+ WL + ++ + + 
Sbjct: 1112 MQKALLPTIYSRCQVWNLLPPATDTALHWLQAQTSVETPEILTALLVNYGRPLLALAMLT 933 

Query : 206 QGDNWQARETLCQA-LAYSVPSGDWYSLLAALNHEQAPARLHWLATLLMDALKRHHGAAQ 2 64 

Q Q RE L Q L Y S LL N E +L WL L D+LK + A Q 

Sbjct : 932 QHLPEQRREFLRQFWLFYRRRSP — LELLPFFNKEILLQQLDWLLAFLSDSLK-NKLAIQ 762 

Query : 265 VTNV- -DVPGLVAELANHLSPSRLQAILGDVCHIREQLMSVTGINRELLITDLLLRI 319 

+ D+ V + + LS L V +R L + +N+EL++ D L R+ 

Sbjct: 761 ENWICRDIERGVIQFSQGLSAPALLKATQIVGKVRSDLAANNALNQELILLDGLTRL 591 



gnl | TIGR | S . putref aciens_gsp_271 Shewanella putrefaciens unfinished fragment of complet 
Length = 11991 

Score = 95.2 bits (233), Expect = 3e-19 

Identities = 51/181 (28%), Positives = 81/181 (44%) 

Frame = +3 

Query : 5 PWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRG 64 

PWL + + Q + HAL+ G + L ++R +C QP CG C + 

Sbjct: 1842 PWLDVPRQAFLTQLQTQKVPHAQLVGIDSAYGGELLSVFMARAAMCSQPTHTGGCGFCKS 2 021 

Query : 6 5 CQLMQAGTHPDYYTLAPEKGKNTLGVDAVREVTEK^ 124 

CQL AG HPD+Y + E + + VD +RE+ +L+ A+ G +V + 
Sbjct: 2022 CQLFDAGNHPDFYQI--EADGHQIKVDQIRELCSRLSATAQQSGRRVAIIHHSERLNSAS 2195 



Query: 125 XXXXXXXXEEPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWLSREVTMSQ 184 

EEP +T L + P RL+AT+ SRC + P + WL + + + + 

Sbjct: 2196 ANALLKTLEEPGKDTLLLLHSDTPARLMATISSRCQRLPFVAPSKTLIKNWLIQQCQIQE 2375 

Query: 185 D 185 
D 

Sbjct: 2376 D 2378 

gnl | Sanger | N .mening_Contig4 Neisseria meningitidis serogroup A unfinished fragment of cc 
Length = 236507 

Score =88.9 bits (217), Expect = 2e-17 

Identities = 53/173 (30%), Positives = 86/173 (49%), Gaps = 8/173 (4%) 
Frame = -3 

Query: 4 YPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQP-QGHKSCGHC 62 

YPW + + + +A + R+AL GGA + + LLC++P G+ CG C 

Sbjct: 209151 YPWHQEQWRQ-IAEHWTSRPN-AWLFVGKKGTGKTAFARFAAKALLCEKPVTGNVPCGEC 208978 

Query: 63 RGCQLMQAGTHPDYYTLAP EKGKNTLGV- -DAVREVTEKLNEHARLGGAKWWVX 115 

C L + G+HPD+Y + P E G+ L + DAVRE+ + + + GG +V+ + 

Sbjct: 208977 ASCHLFEQGSHPDFYEITPLTDERENGRKLLQIKIDAVREIIDNVYLTSVRGGLRVILIH 208798 

Query: 116 XXXXXXXXXXXXXXXXXEEPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTW 17 5 

EEPP + F L + + ++L T++SRCR L P + A + 
Sbjct: 208797 PAESMNVQAANSLLKVLEEPPPQWFLLVSHAADKVLPTIKSRCRKMVLPAPSHEEASAY 208618 

Query: 176 L 176 
L 

Sbjct: 208617 L 208615 

gnl | OUACGT | Ngon_Contigl91 Neisseria gonorrhoeae unfinished fragment of complete genome 
Length = 20169 

Score =87.4 bits (213), Expect = 6e-17 

Identities = 54/173 (31%), Positives = 84/173 (48%), Gaps = 8/173 (4%) 
Frame = -1 

Query: 4 YPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQ-GHKSCGHC 62 

YPW + + + +A + R+AL GGA + + LLC+ P G K CG C 

Sbjct: 4188 YPWHQEQWRQ-IAEHWTSRPN-AWLFVGKKGTGKTAFARFAAKALLCETPAPGCKPCGEC 4015 

Query: 63 RGCQLMQAGTHPDYYTLAP EKGKNTLGV- - DAVREVTEKLNEHARLGGAKWWVX 115 

C L G+HPD+Y + P E G+ L + DAVRE+ + + + GG +V+ + 

Sbjct: 4014 MSCHLFGRGSHPDFYEITPLADEPENGRKLLRIKIDAVREIIDNVYLTSVRGGLRVILIH 3835 

Query: 116 XXXXXXXXXXXXXXXXXEEPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTW 175 

EEPP + F L + +++L T++SRCR LP A+ + 
Sbjct: 3834 PAESMNVQAANSLLKVLEEPPPQWFLLVSHAADKVLPTIKSRCRKMVLPAPSHGEALAY 3655 

Query: 17 6 L 176 
L 

Sbjct: 3654 L 3652 

gnl|TIGR|D.radiodurans_8842 Deinococcus radiodurans unfinished fragment of complete gene 
Length = 18340 



Score =76.5 bits (185), Expect = le-13 



Identities = 47/150 (31%), Positives = 67/150 (44%) 
Frame = +2 



Query: 14 LVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTH 73 

L + + GR HA L G+G + + CP K CG C C ++AG+H 

Sbjct: 13439 LRTALEQGRIGHAYLFSGPRGVGKTTTARLIAMTANCTGP-APKPCGECESCLAVRAGSH 13615 

Query: 74 PDYYTLAPEKGKNTLGVT)AVREVTEKLNEHARLG 133 

PD + + VD VR++ EK+ A GG K+ + E 
Sbjct: 13616 PDVME I DAASNNS VDDVRDLREKVGLAAMRGGKKIYILDEAHMMSRAAFNALLKTLE 13786 

Query: 134 EPPAETWFFLATREPERLLATLRSRCRLHY 163 

EPP F LAT EPE+++ T+ SRC+ HY 
Sbjct: 13787 EPPEHVIFILATTEPEKIIPTILSRCQ-HY 13873 

gnl | Sanger | B.pertussis_Contig654 Bordetella pertussis unfinished fragment of complete ge 
Length = 10062 

Score =73.4 bits (177), Expect = le-12 

Identities = 55/178 (30%), Positives = 77/178 (42%), Gaps = 30/178 (16%) 
Frame = -2 

Query : 2 RWYPWLRPDFEKLVASYQAGRGH- -HALLIQALPGMGDDALI YALSRYLLCQQPQGHKSC 5 9 

R+ PW + + S+ +GR HA LI G+G A + LLC+ P+ +C 

Sbjct: 6023 RFLPWQT EIARSWLSGRDRFAHAWLIHGNGGIGKLDFTAAAAASLLCESPRQGLAC 585 6 

Query: 60 GHCRGCQLMQAGTHPDYYTLAPEK GKNTLGVD 91 

G C C + +G HPD + PE + +D 

Sbjct: 5855 GECAACAWVASGNHPDLRRIRPEAVALEEGADQTEGAEEAEAGSGGAAAKRAPSKDIRID 5676 

Query: 92 AV11EVTEKLNEHARLGGAKWWVXXXXXXXXXXXXXXXXXXEEPPAETWFFLATREPERL 151 

+R + N GG +V + EEPPA T F L P+RL 

Sbjct: 5675 QIRALEPWFNTATHRGGWRVALLYPAHALNVISANALLKVLEEPPAHTVFLLVADAPDRL 5496 

Query: 152 LATLRSRCRLHYLAGPPEQYAVTWLSRE 179 

L TL SRCR L A+ WL + 

Sbjct: 5495 LPTLVSRCRRLPLPTXSAGQALQWLGEQ 5412 

gnl |OUACGT|Ngon_Contig223 Neisseria gonorrhoeae unfinished fragment of complete genome 
Length = 90586 

Score = 73.0 bits (176), Expect = le-12 

Identities = 44/139 (31%), Positives = 62/139 (43%) 

Frame = -2 

Query : 2 1 GRGHHALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLA 8 0 

GR HHA L+ G+G + L++ L C+ Q + CG C+ C + AG + D L 
Sbjct: 72852 GRLHHAYLLTGTRGVGKTTIARILAKSLNCENAQHGEPCGVCQSCTQIDAGRYVD--LLE 72679 

Query: 81 PEKGKNTLGVDAVllEVTEKLNEHARLGGAKVvWXXXXXXXXXXXXXXXXXXEEPP 140 

+ NT G+D +REV E G KV + EEPP 

Sbjct: 72678 IDAASNT-GIDNIREVLENAQYAPTAGKYKVYIIDEVHMLSKSAFNAMLKTLEEPPEHVK 72502 

Query: 141 FFLATREPERLLATLRSRC 159 

F LAT +P ++ T+ SRC 
Sbjct: 72501 F I LATTDPHKVPVTVLSRC 72445 

gnl | Sanger | N.mening_Contig3 Neisseria meningitidis serogroup A unfinished fragment of cc 



Length = 291782 



Score =73.0 bits (176), Expect = le-12 

Identities = 44/139 (31%), Positives = 62/139 (43%) 

Frame - +2 

Query: 21 GRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLA 80 

GR HHA L+ G+G + L++ L C+ Q + CG C+ C + AG + D L 
Sbjct: 180815 GRLHHAYLLTGTRGVGKTTIARILAKSLNCENAQHGEPCGVCQSCTQIDAGRYVD- - LLE 180988 

Query: 81 PEKGKNTLGVDAVREVTEKLNEHARLGGAKV 140 

+ NT G+D +REV E G KV + EEPP 

Sbjct: 180989 IDAASNT-GIDNIREVLENAQYAPTAGKYKVYIIDEVHMLSKSAFNAMLKTLEEPPEHVK 181165 

Query: 141 FFLATREPERLLATLRSRC 159 

F LAT +P ++ T+ SRC 
Sbjct: 181166 FILATTDPHKVPVTVLSRC 181222 



Score = 41.4 bits (95), Expect = 0.004 
Identities = 16/37 (43%), Positives = 25/37 (67%) 
Frame = +2 

Query : 4 YPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDAL 4 0 

YPWL P + ++ ++ G GHHA+LI+A G+G + L 
Sbjct: 268937 YPWLMP I YHQ I AQTFDEGLGHHAVL I KADAGLGVERL 269047 



emb| AL123456 |MTBH37RV Mycobacterium tuberculosis H37Rv complete genome 
Length = 4411529 

Score =69.1 bits (166), Expect = 2e-ll 

Identities = 49/158 (31%), Positives = 68/158 (43%), Gaps = 5/158 (3%) 
Frame = -3 

Query : 16 ASYQAGRGH HALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGT 72 

+ + +■ AG G HA L+ PG G + L C G CG CR C AGT 

Sbjct: 4082634 SAHSAGGGGTMTHAWLLTGPPGSGRSVAALCFAAALQCTSG-GEPGCGRCRACTTTLAGT 4082458 

Query: 7 3 HPDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXXXXX 132 

H D + PE ++GVD +R + + G ++V + 

Sbjct: 4082457 HADVRRVIPE--GLSIGVDEMRAIVQIAARRPTTGHWQIWIEDADRLTEGAANALLKW 4082284 

Query: 133 EEPPAETWFFLA--TREPERLLATLRSRCRLHYLAGPPEQYAV 173 

EEPP T F L + +PE + TLRSRCR H P +A+ 

Sbjct: 4082283 EEPPPSTVFLLCAPSVT)PEDIAVTLRSRCR-HVALVTPSTHAI 4082158 



Score = 55.8 bits (132), Expect = 2e-07 

Identities = 44/150 (29%), Positives = 58/150 (38%) 

Frame = -2 

Query : 1 4 LVASYQAGRGHHALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCGHCRGCQLMQAGTH 7 3 

L + AGR +HA L G G + L+R LCQ CGCC+A 

Sbjct: 4166656 LSVALDAGRINHAYLFSGPRGCGKTSSARILARSLNCAQGPTANPCGVCESCVSL-APNA 4166480 

Query: 74 PDYYTLAPEKGKNTLGVDAWEVTEKLNEHARLGGAKV^ 13 3 

P + + GVD RE+ ++ +V V E 

Sbjct: 4166479 PGSIDWELDAASHGGVDDTRELRDRAFYAPVQSRYRVFIVDEAHMVTTAGFNALLKIVE 4166300 



Query: 134 EPPAETWFFLATREPERLLATLRSRCRLHY 163 



EPP F AT EPE++L T+RSR HY 
Sbjct: 4166299 EPPEHL I F I FATTEPEKVLPTIRSRTH- HY 4166213 



gnl | TIGR | gmt3711 Mycobacterium tuberculosis unfinished fragment of complete genome 
Length = 56385 

Score =69.1 bits (166), Expect = 2e-ll 

Identities = 49/158 (31%), Positives = 68/158 (43%), Gaps = 5/158 (3%) 
Frame = -3 

Query: 16 ASYQAGRGH HALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGT 72 

+ + + AG G HA L+ PG G + L C G CG CR C AGT 

Sbjct: 44926 SAHSAGGGGTMTHAWLLTGPPGSGRSVAALCFAAALQCTSG-GEPGCGRCRACTTTLAGT 44750 

Query: 73 HPDYYTLAPEKGKNTLGVDAWEVTEKLNEHARLGGA 132 

H D + PE ++GVD +R + + G + +V + 

Sbjct: 44749 HADVRRVIPE--GLSIGVDEMRAIVQIAARRPTTGHWQIWIEDADRLTEGAANALLKW 44576 

Query: 133 EEPPAETWFFLA--TREPERLLATLRSRCRLHYLAGPPEQYAV 173 

EEPP T F L + +PE + TLRSRCR H P +A+ 

Sbjct: 44575 EEPPPSTVFLLCAPSVDPEDIAVTLRSRCR-HVALVTPSTHAI 44450 



gnl | Sanger_1765 |mbovis_Contigl041 . 0 Mycobacterium bovis unfinished fragment of complete 
Length = 10794 

Score = 69.1 bits (166), Expect = 2e-ll 

Identities = 49/158 (31%), Positives = 68/158 (43%), Gaps = 5/158 (3%) 
Frame = +3 

Query : 1 6 ASYQAGRGH HALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCGHCRGCQLMQAGT 7 2 

+ + + AG G HA L+ PG G + L C G CG CR C AGT 

Sbjct: 4962 SAHSAGGGGTMTHAWLLTGPPGSGRSVAALCFAAALQCTSG-GEPGCGRCRACTTTLAGT 5138 

Query: 73 HPDYYTLAPEKGKNTLGVDAWEVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXXXXX 132 

H D + PE ++GVD +R + + G ++V + 

Sbjct: 5139 HADVRRVI PE - -GLS IGVDEMRAI VQI AARRPTTGHWQI WI EDADRLTEGAANALLKW 5312 

Query: 133 EEPPAETWFFLA--TREPERLLATLRSRCRLHYLAGPPEQYAV 173 

EEPP T F L + +PE + TLRSRCR H P +A+ 

Sbjct: 5313 EEPPPSTVFLLCAPSVDPEDIAVTLRSRCR-HVALVTPSTHAI 5438 



gb| AE000657 | AE000657 Aquifex aeolicus complete genome 
Length = 1551335 

Score = 67.5 bits (162), Expect = 6e-ll 

Identities = 39/136 (28%), Positives = 58/136 (41%) 

Frame = +1 

Query: 25 HALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKG 84 

HA L G+G + L++ L C+ P + CG C C+ + G PD + 

Sbjct: 1303996 HAYLFAGPRGVGKTTIARILAKALNCKNPSKGEPCGECENCREIDRGVFPDLIEMDAASN 1304175 

Query: 85 KNTLGVT)AWEVTEKLNEHARLGGAKVWVXXXXXXXXXXXXXXXXXXEEPPAE 144 

+ G+D VR + E +N G KV + EEPP T F L 

Sbjct: 1304176 R---GIDDVT^LKEAVmKPIKGKYKVYIIDEAHMLTKEAFNALLKTLEEPPPRTVFVLC 1304346 



Query: 145 TREPERLLATLRSRCR 160 

T E +++L T+ SRC+ 



Sbjct: 1304347 TTEYDKILPTILSRCQ 1304394 



Score =43.0 bits (99), Expect = 0.001 

Identities = 35/132 (26%), Positives = 56/132 (41%), Gaps = 28/132 (21%) 
Frame = +3 



Query 
Sbjct 
Query 
Sbjct 
Query 
Sbjct 



27 LLIQALPGMGDDALI YALSRYLLCQQ- - PQGHKSCGHCRGCQLMQA 7 0 

LL G G + ++ +LC + + P G SC C+ + + 

1082652 LLFYGKEGSGKTKTAFEFAKGI LCKENVPWGCGSC PSCKHVNELEEAFFKGE I EDFKVYK 1082831 

71 GTHPDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKWWVXXXX 118 

G HPD+ + P + + ++ +REV L KV+ + 

1082832 DKDGKKHFVYLMGEHPDFWIIPSG--HYIKIEQIREVKNFAYVKPALSRRKVIIIDDAH 1083005 

119 XXXXXXXXXXXXXXEEPPAETWFFLATREPERLLATLRSR 158 

EEPPA+T F L T +L T+ SR 

1083006 AMTSQAANALLKVLEEPPADTTF I LTTNRRS AI LPTI LSR 1083125 



gnl | Sanger |B.pertussis_Contig889 Bordetella pertussis unfinished fragment of complete 
Length = 1034 

Score = 64.4 bits (154), Expect = 5e-10 

Identities = 41/138 (29%), Positives = 57/138 (40%) 

Frame = +2 

Query: 22 RGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAP 81 

R HHA L G+G L L++ L C + K CG CR C + AG DY L 

Sbjct: 626 RLHHAWLFTGTRGVGKTTLSRILAKSLNCENGITSKPCGQCRACTEIDAGRFVDYLELDA 805 

Query: 82 EKGKNTLGVDAVREVTEKLNEHARLGGAKW 141 

+ GV+ + ++ E+ G KV + EEPP F 

Sbjct: 806 ASNR GVEEMTQLLEQAVYAPGAGRFKVYMIDEVHMLTGHAFNAMLKTLEEPPPHVKF 976 

Query: 142 FLATREPERLLATLRSRC 159 

LAT + P+ + T+ SRC 
Sbjct: 977 ILATTDPQIIPVTVLSRC 1030 

gnl | TIGR| t_ferrooxidans_1986 Thiobacillus ferrooxidans unfinished fragment of complete 
Length = 733 

Score = 64.4 bits (154), Expect = 5e-10 

Identities = 46/149 (30%), Positives = 66/149 (43%), Gaps = 7/149 (4%) 
Frame = -3 

Query: 2 8 L I QAL PGMGDDAL I YA L S R YLLC QQ PQGHK - S CGHC RGCQ LMQAGTH PD YYT LA P 81 

L QA+ G+ + A L + LC P CG CR C+L+ G HPD + P 

Sbjct: 542 LPQAMLAAGESGTLVAQYCDDLQQVALCFAPTAQGLPCGTCRSCRLLAEGNHPDLLMITP 363 

Query: 82 EKGKNTLGVDAVI^EVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXXXXXEEPPAETWF 141 

E GK + ++AVR EL ++ + + + EEP A 

Sbjct: 362 ETGKR-ITIEAVRHANEFLAFTPQVSACRWLRIAPAEAMTAAAANALLKTLEEPAARAHI 186 

Query: 142 FLATREPERLLATLRSRC-RLHYLAGPPEQYAVTWL 176 

L + P +L+ T+RSR RL + P Q V WL 
Sbjct: 185 LLLSEHPSQLIPTIRSRLQRLPFPTMLPGQ-CVNWL 81 



gnl |TIGR|C. tepddum_gct_9 Chlorobium tepidum unfinished fragment of complete genome 



Length = 255408 



Score =64.0 bits (153), Expect = 7e-10 

Identities = 54/170 (31%), Positives = 78/170 (45%), Gaps = 45/170 (26%) 
Frame = -3 

Query: 9 PDFEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQP QGHKSCGHCRGC 65 

P L+ARHAL GG +++ + L++ L C+ G SCG C C 

Sbjct: 252943 PQLRVLKTALGANRLAHAYLFTGPEGSGKESVAFELAKILNCRSSGNLSGEGSCGECESC 252764 

Query: 66 QLMQAGTHPD YYTLAPEKGKN 86 

+ HP+ Y L EK KN 

Sbjct: 252763 RQTDLLMHPNIEYLFPVEAALLETIDPSKKENKKLTEARERYEALLDEKRKNPFFTPAME 252584 

Query: 87 -TLGV--DAWEVTEKLNEHARLGGAKvWVXXXXXXXXXXXXXXXXXXEEPPAETWFFL 143 

++G+ +V ++K+ RGGKV + EEPPA F L 

Sbjct: 252583 RSMG I LTEQWMLQQKAS LAPRDGGKKVF 1 1 SQAERLH PTAANKLLKLLEE P PAHWF I L 252404 

Query: 144 ATREPERLLATLRSRCRLHYLAGPPEQYAVTWLSR 178 

+ PE +L T+RSRC+L A P W++R 
Sbjct: 252403 VSSRPESVLPTIRSRCQLLNFARPRPAEIEAWIAR 252299 



gnl | TIGR | gef_6277 Enterococcus faecalis unfinished fragment of complete genome 
Length = 9336 

Score =62.8 bits (150), Expect = 2e-09 

Identities = 37/153 (24%), Positives = 66/153 (42%), Gaps = 1/153 (0%) 
Frame = -1 

Query : 11 FEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQA 7 0 

+ ++L S+ + GR HA L + G G +++++ C + C C C + 

Sbjct: 8865 YKQLQKSFEHGRLAHAYLFEGDTGTGKQEFGLWMAKHVFCTNLVNQQPCNECHNCVRINE 8686 

Query : 7 1 GTHPDYYTLAPEKGKNTLGVT)AVTIEVTEKLNEHARLGGAKWWVXXXXXXXXXXXXXXXX 13 0 

HPD +AP+ T+ V+ +RE+ + ++ KV + 

Sbjct: 8685 NEHPDVLRIAPD--GQTIKVNQIRELKAEFSKSGVETAKKVFLIQEADKMSTGAANSLLK 8512 

Query: 131 XXEEPPAETWFFLATREPERLLATLRSRCR-LHY 163 

EEP + L T R+L T++SRC+ LH+ 
Sbjct: 8511 FLEEPEGQILAILETTSLSRILPTIQSRCQTLHF 8410 



gnl | Sanger | Y.pesits_Contig790 Yersinia pestis unfinished fragment of complete genome 
Length = 98765 

Score =62.8 bits (150), Expect = 2e-09 

Identities = 40/144 (27%), Positives = 60/144 (40%) 

Frame = -3 

Query : 2 1 GRGHHALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLA 8 0 

GR HHA L G+G + + L++ L C+ CG C CQ ++ G D + 

Sbjct: 63444 GRIHHAYLFSGTRGVGKTSIARLLAKGLNCETGITATPCGTCANCQEIEQGRFVDLIEI- 63268 

Query: 81 PEKGKNTLGVX)AVl^EVTEKLNEHARLGGAKVvWXXXXXXXXXXXXXXXXXXEEPPAETO 140 

+ V+ RE+ + + G KV + EEPPA 

Sbjct: 63267 --DAASRTKVEDTRELLDNVQYAPARGRFKVYLIDEVHMLSRHSFNALLKTLEEPPAHVK 63094 

Query: 141 FFLATREPERLLATLRSRCRLHYL 164 

F LAT +P++L T+ SRC +L 
Sbjct: 63093 FLLATTDPQKL PVT I L SRC LQFHL 63022 



gnl |TIGR|C. trachomatis_ct_97 Chlamydia trachomatis MOPN unfinished fragment of complete 
Length = 4554 

Score =62.5 bits (149), Expect = 2e-09 

Identities = 41/161 (25%), Positives = 67/161 (41%), Gaps = 1/161 (0%) 
Frame = -2 

Query: 17 SYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQP-QGHKSCGHCRGCQLMQAGTHPD 75 

S+RHA+ +GG L ++LCQP++CCC++GTD 
Sbjct: 1487 S LRLNRS AHAY I F SGI RGTGKTTLARVF AKALNCQS PTENQE PCNQC A I C KE I SLGT SMD 1308 

Query: 76 YYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAK^ 135 

+ G + G+ + +R++ E + K+ + EEP 

Sbjct: 1307 VMEI DGASHRGIEDIRQINETVLXVPSKSRYKIYIIDEVHMLTKEAFNSLLKTLEEP 1137 

Query: 13 6 PAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWLS 177 

PA FFLAT E ++ T+ SRC+ L PE+ + L+ 
Sbjct: 1136 PAHVKFFLATTEIAKI PNTISSRCQKMLLKRI PEETI IDKLT 1011 



gnl | PAGP | Paeruginosa_Contig53 Pseudomonas aeruginosa unfinished fragment of complete ger 
Length = 1300758 

Score =62.1 bits (148), Expect = 3e-09 

Identities = 69/268 (25%), Positives = 103/268 (37%), Gaps = 12/268 (4%) 
Frame = +2 

Query : 1 4 LVAS YQAGRGHHALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCGHCRGCQLMQAGTH 7 3 

L+ + R HHA L G+G + L+ + L C+ CG C O + G 

Sbjct: 943820 LINALDNQRLHHAYLFTGTRGVGKTTIARILAKCLNCETGVSSTPCGECSVCREIDEGRF 943999 

Query: 74 PDYYTLAPEKGKNTLGVDAVKEVTEKLNEHARLGGAKWWVXXXXXXXXXXXXXXXXXXE 13 3 

D L + V+ RE+ + + G KV + E 

Sbjct: 944000 VD LIEVDAASRTKVEDTRELLDNVQYSPTRGRYKVYLIDEVHMLSSHSFNALLKTLE 944170 

Query : 134 EPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWL SREVTMSQDXXX 188 

EPP F LAT + P+ + L T+ SRC LP+VL +V D 

Sbjct: 944171 EPPPHVKFLLATTDPQKLPVTILSRCLQFSLKNMPPERWEHLTHVLGAENVPFEDDALW 944350 

Query: 189 XXXXXXXXXXXXXXXXFQGDNWQARETLCQALAY SVPSGDWYSLLAALNHEQA 241 

G A QA+A+ V + D + + L L+H Q 

Sbjct: 944351 LLGRAA DGSMRDAMSLTDQAIAFGEGKVLAADVRAMLGTLDHGQVYGVL 944497 

Query: 242 PARLHWLATLLMDALKRHHGAAQVTNVDVPGLVAELANHL 2 81 

A L A L++A++ H A Q D G++AE+ N L 
Sbjct: 944498 Q AL L EGDARAL L EAVR- - HLAEQ - - G PDWGGVLAE I LNVL 944605 



gnl |TIGR|N.meningitidis_GNMCF18R Neisseria meningitidis MC58 unfinished fragment of cornt 
Length =639 

Score =60.9 bits (145), Expect = 6e-09 

Identities = 39/125 (31%), Positives = 52/125 (41%) 

Frame = -1 

Query: 21 GRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLA 80 

GR HHA L+ G+G + L++ LC+ Q +CGC C +AG + D L 
Sbjct: 3 69 GRLHHAYLLTGTRGVGKTTIARILAKSLNCENAQHGEPCGVCESCTQIDAGRYVD--LLE 19 6 



Query: 81 PEKGKNTLGVDAVREVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXXXXXEEPPAETW 140 

+ NT G+D +REV E G KV + EEPP 

Sbjct: 195 IDAASNT-GIDNIREVLENAQYAPTAGKYKVYIIDEVHMLSKSAFNAMLKTLEEPPEHVK 19 



Query: 141 FFLAT 145 

F LAT 
Sbjct: 18 FILAT 4 



gnl | TIGR|T.maritima_tm_26 Thermotoga maritima unfinished fragment of complete genome 
Length = 18920 

Score = 60.9 bits (145), Expect = 6e-09 

Identities = 37/157 (23%), Positives = 63/157 (39%) 

Frame = -2 

Query : 14 LVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTH 7 3 

+++Q H+ GG L L++ L C+ +G + C CR C+ + GT 

Sbjct: 5536 IIGAIQKNSVAHGYIFAGPRGTGKTTLARILAKSLNCENRKGVEPCNSCRACREIDEGTF 5357 

Query: 74 PDYYTLAPEKGKNTLGVDAWEVTEKLNEHA 13 3 

D L + G+D +R+ + + GKV+ E 

Sbjct: 5356 MDVIELDAASNR GIDEIRRIRDAVGYRPMEGKYKVYIIDEVHMLTKEAFNALLKTLE 5186 

Query: 134 EPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQ 170 

EPP+ F LAT E+ + T+ SRC++ P++ 
Sbjct: 5185 EPPSHWFVLATTNLEKVPPTIISRCQVFEFRNIPDE 5075 



gnl |TIGR| P.gingivalis_1194 Porphyromonas gingivalis W83 unfinished fragment of complete 
Length = 418115 

Score = 59.7 bits (142), Expect = le-08 

Identities = 79/303 (26%), Positives = 126/303 (41%), Gaps = 53/303 (17%) 
Frame = +2 



Query : 2 5 HALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAP 81 

HA L G G LA +RYL CQ P +CGHC C A HPD + + P 

Sbjct: 102800 HAQLFAGEEGGGAFPLALAYARYLNCQMPTDTDACGHCPSCVKYDALAHPDLFFVYPWN 102979 

Query: 82 EKGKNTLGVDA VREVTEKLNEHAR 105 

+ + LG ++ V +KL+ 

Sbjct: 102980 ASSSPAPSDDYIRQWREMLGSESYFTPADWLEYIKAGNSQPIIYSKEAEAVEQKLSFRIY 103159 

Query: 106 LGGAKVVWVXXXXXXXXXXXXXXXXXXEEPPAETWFFLATREPERLLATLRSRCRLHYLA 165 

+W + EEPP T FF+ + EP+++L T+RSR +L + 

Sbjct: 103160 EASYRWMIWQPERMNEAMANKLLKLIEEPPEHTLFFMISSEPDKVLGTIRSRTQLINVR 103339 

Query: 166 GPP EQ Y AVTWL S REVTMSQDXXXXXXXXXXXXXXXXXXXFQGDNWQ ARE - - T LC Q AL A Y S 223 

E V LSR + ++G+ W R+ L + S 

Sbjct: 103340 LLHE I E I VEAL SRNNQGNTADI I RI AHLAEGNYRRAMDL YRGE - WADRDNF VLMGRMMG S 103516 

Query: 224 VPSGDWYSL LAALNHEQAPARLHWLATLLMDALKRHHGAAQVT--NVDVPGLVA 275 

+ GD + LAAL L + L + G A++ + + V 

Sbjct: 103517 IIKGDPSKMRPVADELAALGRVSQIGFLTYCLRLFRELYISRVGVAKLNYLSPEEESFVD 103696 

Query: 276 ELANHLSPSRLQAILGDV CHIREQLMSVTGINRELLITDLLLRIEHYLQPGV 327 

L+ ++ ++ ++ +V HIR+ N ++ DLLLR+ L P + 

Sbjct: 103697 ML SGG I TGQNI RP VME EVE L A I RH I RQ NGNGRMIFFDLLLRLTAALAPAL 103846 



gnl I Sanger I S . typhi_Contig376 Salmonella typhi unfinished fragment of complete genome 
Length = 157214 

Score = 59.3 bits (141), Expect = 2e-08 

Identities = 38/144 (26%), Positives = 60/144 (41%) 

Frame = +1 

Query : 2 1 GRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLA 8 0 

GR HHA L G+G ++ L++ L C+ CG C C+ ++ G D + 

Sbjct: 13384 GRIHHAYLFSGTRGVGKTSIARLLAKGLNCETGITATPCGVCDNCREIEQGRFVDLIEI- 13560 

Query: 81 PEKGKNTLGVT)AVTREVTEKLNEHAR 14 0 

+ V+ R++ + + GKV+ EEPPA 

Sbjct: 13561 --DAASRTKVEDTRDLLDNVQYAPARGRFKVYLIDEVHMLSRHSFNALLKTLEEPPAHVK 13734 

Query: 141 FFLATREPERLLATLRSRCRLHYL 164 

F LAT + P+ + L T+ SRC + L 
Sbjct: 13735 FLLATTDPQKLPVTILSRCLQFHL 13806 



gnl | OUACGT | Spyogenes_Contig243 Streptococcus pyogenes unfinished fragment of complete 
Length = 22344 

Score =59.3 bits (141), Expect = 2e-08 

Identities = 34/140 (24%), Positives = 60/140 (42%) 

Frame = +1 

Query : 2 2 RGHHALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAP 8 1 

R +HA L + + + L++ + C+Q + CGHCR CQL+ + G D LP 

Sbjct: 17944 RLNHAYLFSG--DFANEEMALFLAKVIFCEQKKDQTPCGHCRSCQLIEQGDFADVTVLEP 18117 

Query: 82 EKGKNTLGVDAWEVTEKLNEHARLGGAKVVWXXXXXXXXXXXXXXXXXXE^ 141 

+ D V+E+ + + +V + EEP E + 

Sbjct: 18118 T--GQVIKTDWKEMMANFSQTGYENKRQVFIIKDCDKMHINAANSLLKYIEEPQGEAYI 18291 

Query: 142 FLATREPERLLATLRSRCRL 161 

FL T + ++L T+ + SR + + 
Sbjct: 18292 F LLTNDDNKVL PT I KSRTQV 18351 



gnl |TIGR|S.putrefaciens_gsp_387 Shewanella putrefaciens unfinished fragment of complet 
Length = 3834 

Score =58.9 bits (140), Expect = 2e-08 

Identities = 44/163 (26%), Positives = 61/163 (36%) 

Frame = +1 

Query : 22 RGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAP 8 1 

R HHA L G+G +L ++ L C+ CG C C + G D L 
Sbjct: 562 RLHHAYLFTGTRGVGKTSLARLFAKGLNCETGVTASPCGVCGSCVEIAQGRFVD LIE 732 

Query: 82 EKGKNTLGVmVIlEVTEKLNEHARLGGAKVvWXX^ 141 

+ VD RE+ + + G KV + EEPP F 

Sbjct: 733 VDAASRTKVI)DTRELLDNVQYRPTRGRFKVYLIDEVHMLSRSSFNALLKTLEEPPEtIVKF 912 

Query: 142 FLATREPERLLATLRSRCRLHYLAGPPEQYAVTWLSREVTMSQ 184 

LAT +P++L T+ SRC L +Q T L +T Q 
Sbjct: 913 LLATTDPQKLPVTVLSRCLQFNLKSLTQQEIGTQLQHILTQEQ 1041 



gnl |TIGR|M.avium_5593 Mycobacterium avium unfinished fragment of complete genome 



Length = 21394 



Score = 58.2 bits (138), Expect = 4e-08 

Identities = 46/152 (30%), Positives = 60/152 (39%) 

Frame = +2 

Query: 12 EKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAG 71 

EL + +AGR +HA L G G + L+R LCQ CGCCLA 

Sbjct: 9860 EPLSIALEAGRINHAYLFSGPRGCGKTSSARILARSLNCVQGPTATPCGVCDSC-LALAP 10036 

Query: 72 THPDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXX^ 131 

P + + GVD RE+ + + +V V 

Sbjct: 10037 NAPGSIDWELDAASHGGVDDTRELRDRAFYAPAQSRYRVFIVDEAHMVTTAGFNALLKI 10216 

Query: 132 XEEPPAETWFFLATREPERLLATLRSRCRLHY 163 

EEPP F AT EPE++L T+RSR HY 
Sbjct: 10217 VEEPPEHLIFIFATTEPEKVLPTIRSRTH-HY 10309 



emb| AL009126 |BSUB Bacillus subtilis complete genome 
Length = 4214814 

Score = 58.2 bits (138), Expect = 4e-08 

Identities = 43/154 (27%), Positives = 72/154 (45%), Gaps = 3/154 (1%) 
Frame = +1 

Query: 7 LRPDFEKLVA-SYQAGRGHHALLIQALPGMG--DDALIYALSRYLLCQQPQGHKSCGHCR 63 

L+P KL+ S+RHAL+ GGD AL+ AS + L G + CCR 

Sbjct: 40693 LQPRVMKLLYNSIEKDRLSHAYLFEGKKGTGKLDAALLLAKSFFCL EGGAEPCESCR 40863 

Query: 64 GCQLMQAGTHPDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKWWVXXXXXXXXX 12 3 

C+ + ++G HPD + + P+ + + + + + E+ ++ K+ + 

Sbjct: 40864 NCKRIESGNHPDLHLVQPD--GLSIKKAQIQALQEEFSKTGLESHKKLYIISHADQMTAN 41037 

Query: 124 XXXXXXXXXEEPPAETWFFLATREPERLLATLRSRCR 160 

EEP +T L T +P+RLL T+ SRC+ 
Sbjct: 41038 AANSLLKFLEEPNKDTMAVLITEQPQRLLDTIISRCQ 41148 



Score =49.6 bits (116), Expect = le-05 

Identities = 32/136 (23%), Positives = 52/136 (37%) 

Frame = +1 

Query : 2 5 HALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKG 8 4 

HA L G G + + + + C+ + C C C+ + G+ D + 

Sbjct: 26926 HAYLF SGPRGTGKTS AAKI FAKAVNCEHAPVDEPCNECAACKGI TNGS I SDVI EIDAASN 27105 

>- 

Query: 85 KNTLGVBAWEVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXXXXX 144 

GVD +R++ +K+ KV + EEPP F LA 

Sbjct: 27106 N GVDEIRDIRDKVKFAPSAVTYKVYIIDEVHMLSIGAFNALLKTLEEPPEHCIFILA 27276 

Query: 145 TREPERLLATLRSRCR 160 

T EP ++ T+ SRC+ 
Sbjct: 27277 TTEPHKI PLTI I SRCQ 27324 



gnl |U0KN0R| S.mutans_Contig840 Streptococcus mutans unfinished fragment of complete genon 
Length = 5373 

Score = 58.2 bits (138), Expect = 4e-08 

Identities = 36/153 (23%), Positives = 66/153 (42%) 



Frame = 



-1 



Query : 


11 


FEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQA 


70 
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•aJuu 




41 j j 


Query: 


71 


GTHPDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXXX 


130 






D + PE + +R+ + + + G + +V + 




Sbjct : 


4134 


NDFSDVKVIEPE — GQMIKTATIRDLLREFSSSGFEGQSQVFIIRDADKMHTNAANSLLK 


3961 


Query: 


131 


XXEEPPAETWFFLATREPERLLATLRSRCRLHY 163 








EEP ++T+ L T++ R+L T+ + SR + + Y 




Sbjct: 


3960 


FIEEPQSDTYMILLTQDESRILPTIKSRTQIFY 3 862 





gb|AE001273 |AE001273 Chlamydia trachomatis complete genome 
Length = 1042519 



Score 


= 57.8 


bits (137), Expect = 5e-08 


Identities = 


37/144 (25%), Positives = 59/144 (40%), Gaps = 1/144 (0%) 


Frame 


= +2 


Query: 


17 


SYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQP-QGHKSCGHCRGCQLMQAGTHPD 7 5 






S+RHA+ +GG L ++LCQPQ+CCC++GTD 


Sbjct : 


381368 


SLRLNRAAHAYIFSGIRGTGKTTLARVFAKALNCQNPTQDQEPCNQCAICKEI SLGTSMD 381547 


Query : 


76 


YYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXXXXXEEP 13 5 






+ G + G+ + +R+ + E + K+ + EEP 


Sbjct: 


381548 


VIEI DGASHRGIEDIRQINETVLFVPSKSRYKIYIIDEVHMLTKEAFNSLLKTLEEP 381718 


Query : 


136 


PAETWFFLATREPERLLATLRSRCR 160 






P FFLAT E ++ T+ SRC+ 


Sbjct: 


381719 


PVHVKFFLATTEIAKIPNTISSRCQ 381793 



Score =35.6 bits (80), Expect = 0.25 

Identities = 20/87 (22%), Positives = 34/87 (38%) 

Frame = -1 

Query: 7 3 HPDYYTLAPEKGKNTLGVTDAVTIEVTEKLN^ 132 

HPD + +P+ ++ R + + + H K+ + 

Sbjct: 209608 HPDMHEYSPQGKGRLHTIETPRAIRKDIWIHPYESPYKIYIIYEADRITLDAISAFLKLL 209429 

Query: 133 EEPPAETWFFLATREPERLLATLRSRC 159 

E+PP F L + P+RL T+RSRC 
Sbjct: 209428 EDP PQ YGMF I LVS AL PQRL P PT I RSRC 209348 



gnl |TIGR|C.crescentus_gcc_7 64 Caulobacter crescentus unfinished fragment of complete ger 
Length = 943 

Score =57.0 bits (135), Expect = 9e-08 

Identities = 43/164 (26%), Positives = 68/164 (41%), Gaps = 8/164 (4%) 
Frame = +3 

Query : 1 5 VAS YQAGRGHHALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCGHCRGCQLMQ A 70 

+ + + GR HHA L+ G+G L Y ++R LL + P + + + A 

Sbjct: 366 IDALERGRLHHAWLLTGPEGVGKATLAYRMARRLLGARPDPSQGLLGAAPSDWSRQVAA 545 

Query: 71 GTHPDYYTLA PEKGPCNTLGVDAVTIEVTEKLNEHARLGGAKVVWVXXXXXXXXXXXX 12 6 

+HPD L K + ++ VD R+ + E + +V + 



Sbjct: 546 RSHPDLMVLERLTDDGKARKSIPVDEARKLPEFFANSPAVSPYRVAIIDAADDLNVNAAN 725 

Query: 127 XXXXXXEEPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQYAVTWLSR 178 

EEPPA L + P + LL T+RSRCR + P A + R 

Sbjct: 726 AVLKTLEEPPARGVILLISHAPGKLLPTIRSRCRRLAIPAPGVAAAAXMVER 881 



gnl | Sanger | campylo_C j . seq Campylobacter jejuni NCTC 11168 unfinished fragment of complet 
Length = 1641480 

Score =56.6 bits (134), Expect = le-07 

Identities = 38/134 (28%), Positives = 53/134 (39%) 

Frame = +2 

Query : 2 5 HALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKG 84 

HA L L G G + SR L+C+Q CG C+ C G H D + 

Sbjct: 1089785 HAYLFSGLRGSGKTSSARIFSRALVCEQGPSDTPCGTCKHCLAALEGKHIDIIEMDAASN 1089964 

Query: 85 KNTLGVT)AVTIEVTEKLNEHARLGGAK 144 

+ + A+ E T+ AR K+ + EEPP+ F LA 

Sbjct: 1089965 RGLEDIQALIEQTKYTPSMARF KIFIIDEVHMLTPQAANALLKTLEEPPSYVKFILA 1090135 

Query: 145 TREPERLLATLRSR 158 

T +P +L AT+ SR 
Sbjct: 1090136 TTDPLKLPATVLSR 1090177 



gnl |OUACGT| A.actin_Contig753 Actinobacillus actinomycetemcomitans unfinished fragment of 
genome 

Length = 7256 

Score = 56.2 bits (133), Expect = 2e-07 

Identities = 38/151 (25%), Positives = 58/151 (38%) 

Frame = -1 

Query : 14 LVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTH 7 3 

L + R HHA L G+G ++ + + L C + CG C C ++ G 

Sbjct: 4589 LANGLRENRLHHAYLFSGTRGVGKTSIARLFAKGLNCVSGVTAEPCGVCEHCNAIEKGNF 4410 

Query: 74 PDYYTLAPEKGKNTLGVIDAWEVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXXXXX^ 133 

D + + V+ RE+ + + LG KV + E 

Sbjct: 4409 IDLIEI DAASRTKVEDTRELLDNVQYKPVLGRYKVYLIDEVHMLSRHSFNALLKTLE 4239 

Query: 134 EPPAETWFFLATREPERLLATLRSRCRLHYL 164 

EPP F LAT + P +L T+ SRC +L 
Sbjct: 423 8 EPPEYVKFLLATTDPHKLPVTILSRCMQFHL 4146 



gnl |TIGR|gmt3732 Mycobacterium tuberculosis unfinished fragment of complete genome 
Length = 466170 

Score =55.8 bits (132), Expect = 2e-07 

Identities = 44/150 (29%), Positives = 58/150 (38%) 

Frame = +3 

Query : 14 LVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTH 73 

L + AGR +HA L G G + L+R LCQ CGCC+A 

Sbjct: 394476 LSVALDAGRINHAYLFSGPRGCGKTS SARI LARS LNCAQGPTANPCGVCES CVS L-APNA 394652 



Query: 74 PDYYTLAPEKGKNTLGVDAWEVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXXXXXE 13 3 

P + + GVD RE+ ++ +V V E 



V 



Sbjct: 394653 PGS I DWELDAASHGGVDDTRELRDRAF YAPVQSRYRVF I VDEAHMVTTAGFNALLKI VE 394832 

Query: 134 EPPAETWFFLATREPERLLATLRSRCRLHY 163 

EPP F AT EPE++L T+RSR HY 
Sbjct: 394833 E P PEHL I F I FATTE PEKVL PT I RSRTH - H Y 394919 



gnl |GTC |C.aceto_AE001437 Clostridium acetobutylicum, WORKING DRAFT SEQUENCE , 1 ordered e 
Length = 3943874 

Score = 55.4 bits (131), Expect = 3e-07 

Identities = 36/136 (26%), Positives = 53/136 (38%) 

Frame = +3 

Query : 2 5 HALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKG 8 4 

HA L+ G G LS + + C PQ + C C C+ + AG D L 

Sbjct: 3734367 HAYLMCGTRGTGKTTTAKILSKAVNCLNPQDGEPCNECEMCKKINAGIAIDVTELDAASN 3734546 

Query: 85 KNT LGVD AWEVT EKLNEHARLGG AKVVWVXXXXXXXXXXXXXXXXXXE E P P AE T WF F LA 144 

+ VD +R + + + KV + EEPP F LA 

Sbjct: 3734547 NS VDDIRNI IDDVQYPPHESKFICVYI IDEVHMLSQGAVNAFLKTLEEPPQNWFI LA 3734717 

Query: 145 TREPERLLATLRSRCR 160 

T + P+ + L T+ SRC+ 
Sbjct: 3734718 TTDPQKLPVTILSRCQ 3734765 



Score =44.1 bits (102), Expect = 6e-04 
Identities = 23/80 (28%), Positives = 39/80 (48%) 
Frame = +3 

Query: 85 KNTLGVDA VllEVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXXXXXEE P P AETWFF L A 144 

K ++ VD VR++ E++N+ G K+ + V EEPP + L 

Sbjct: 14097 KKSISVDQVRKIIEEVNKKPYEGNNKLIWHDMDYMTIQGQNAFLKTIEEPPLGVYIILL 14276 

Query: 145 TREPERLLATLRSRCRLHYL 164 

+ R+L T+RSRC+++ L 
Sbjct: 14277 CQSQGRVLDTVRSRCQIYKL 14336 



gnl |TIGR|S.pneumoniae_sp_36 Streptococcus pneumoniae unfinished fragment of complete ger 
Length = 43015 

Score = 54.7 bits (129), Expect = 4e-07 

Identities = 35/165 (21%), Positives = 66/165 (39%) 

Frame = +1 

Query : 6 WLRPDFEKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGC 6 5 

W F+ + V + + +HA L + L++ L C G C CR C 

Sbjct: 23515 WQPAQFDRFVRILEQDQLNHAYLFSGF--FESLEMAQFLAKSLFCTDKVGVLPCEKCRSC 23688 

Query: 66 QLMQAGTHPDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKWWVXXXXXXXXXXX 125 

+L++ G PD + P + + +RE+ + ++ +V + 

Sbjct: 23689 KLIEQGEFPDVTLIKPV--NQVIKTERIRELVGQFSQAGIESQQQVFIIEQADKMHPNAA 23862 

Query: 126 XXXXXXXEEPPAETWFFLATREPERLLATLRSRCRLHYLAGPPEQ 170 

EEP +E + F T + E++L T+RSR ++ + E+ 
Sbjct: 23863 NSLLKVIEEPQSEVYIFFLTSDEEKMLPTIRSRTQIFHFKKQEEK 23997 



gnl |TIGR|V.cholerae_asm864 Vibrio cholerae unfinished fragment of complete genome 



Length = 23778 



Score = 54.3 bits (128), Expect = 6e-07 

Identities = 37/143 (25%), Positives = 55/143 (37%) 

Frame = -3 

Query : 2 2 RGHHALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDY YTLAP 8 1 

R HHA L G+G + ++ L C+ CG C CQ + G D + 

Sbjct: 14509 RLHHAYLFSGTRGVGKTTIGRLFAKGLNCETGITATPCGQCATCQEIDQGRFVDLLEI-- 14336 

Query: 82 EKGKNTLGVT)AWEVTEKLNEHARL^^ 141 

+ V+ RE+ + + G KV + EEPP F 

Sbjct: 14335 -DAASRTKVEDTRELLDNVQYKPARGRFKVYLIDEVHMLSRHSFNALLKTLEEPPEYVKF 14159 

Query: 142 FLATREPERLLATLRSRCRLHYL 164 

LAT + P++L T+ SRC +L 
Sbjct: 14158 LLATTDPQKLPVTILSRCLQFHL 14090 



gnl | TIGR| P . gingivalis_1209 Porphyromonas gingivalis W83 unfinished fragment of complete 
Length = 276255 

Score =53.9 bits (127), Expect = 8e-07 

Identities = 36/137 (26%), Positives = 59/137 (42%), Gaps = 2/137 (1%) 
Frame = +2 

Query: 25 HALLIQALPGMGDDALI YALSRYLLCQQ- - PQGHKSCGHCRGCQLMQAGTHPDYYTLAPE 82 

HA L G+G + +R + C + P G ++CG C C + + Y L 

Sbjct: 15524 HAYLFCGPRGVGKTSCARIFARAINCLERLPDG-EACGRCESCKAFDEQRSMNIYELDAA 15700 

Query: 83 KGKNTLGVDAVREVTEKLNEHARLGGAKVVWXXXXXXXXXXXXXXXXXXEEPPAETW 142 

+ VD +R + E+ N ++G K+ + EEPP+ F 

Sbjct: 15701 SNNS VDDIRLLIEQANVPPQIGKYKIYIIDEVHMLSQQAFNAFLKTLEEPPSYVIFI 15871 

Query: 143 LATREPERLLATLRSRCRL 161 

LAT E ++L T+ SRC + + 
Sbjct: 15872 LATTEKHKILPTILSRCQI 15928 



gnl | Sanger_1765 |mbovis_Contig454 . 1 Mycobacterium bovis unfinished fragment of complete c 
Length =1934 

Score =53.5 bits (126), Expect = le-06 

Identities = 42/150 (28%), Positives = 57/150 (38%) 

Frame = -3 

Query : 1 4 LVAS YQAGRGHHALL I QALPGMGDDAL I YALSRYLLCQQ PQGHKSCGHCRGCQLMQAGTH 7 3 

L + AGR +HA L G G + L+R L C Q CG C C + 

Sbjct: 1185 LSVALDAGRINHAYLFSGPRGCGKTS SARI LARS LNCAQGPTANPCGVCESCVSLAPNAL 1006 

Query: 74 PDYYTLAPEKGKNTLGVDAVTREVTEKLNEHARLGGAK^ 133 

+ + + GVD RE+ ++ +V V E 

Sbjct: 1005 GSIDWELDAASHG-GVDDTRELRDRAFYAPVQSRYRVFIVDEAHMVTTAGFNALLKIVE 829 

Query: 134 EPPAETWFFLATREPERLLATLRSRCRLHY 163 

EPP F AT EPE++L T+RSR HY 
Sbjct: 828 EPPEHLIFIFATTEPEKVLPTIRSRTH-HY 742 



gb|AE000520 |AE000520 Treponema pallidum complete genome 
Length = 1138011 



Score =53.1 bits (125), Expect = le-06 

Identities = 38/147 (25%), Positives = 60/147 (39%) 

Frame = -3 



Query: 14 LVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTH 73 

LS+ + AL GG + L++ LCQ + +CGC C+ + GT+ 

Sbjct: 1094869 LQKSLEENKVSPAYLFSGPHGCGKTSCARILAKALNCVQREASEPCGECPSCREIATGTN 1094690 

Query : 7 4 PDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKVVWXXXXXXXXXXXXXXXXXXE 13 3 

+ + G + GV VR++ E++ KV + E 

Sbjct: 1094689 LNVIEI DGASHTGVGDVRQ I KEE I LF P PHGTRYKVF 1 1 DEVHML SNS AFNAL LKT I E 1094519 

Query: 134 E P PAETWFFLATRE PERLLATLRSRCR 160 

EPP F AT E R+ AT++SRC+ 
Sbjct: 1094518 EPP PYWF I F ATTEVHRI PATVKSRCQ 1094438 



gb| AE000511 | HPYL Helicobacter pylori 26695 complete genome 
Length = 1667867 

Score =51.5 bits (121), Expect = 4e-06 

Identities = 38/152 (25%), Positives = 58/152 (38%) 

Frame = -1 



Query : 2 5 HALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKG 8 4 

+A L L G G + +R L+C + + C C CQ H D + G 
Sbjct: 772097 NAYLFSGLRGSGKTSSSRIFARALMCEEGPKAVPCDTCIQCQSALNNHHIDIIEM DG 771927 

Query: 85 KNTLGVTJAWEVTEKLNEHARLGGAKWWVXXXXXXXXXXXXXXXXXXEEPPAETW 144 

+ G+D VR + E+ G K+ + EEPP+ F LA 

Sbjct: 771926 ASNRGIDDVRNLIEQTRYKPSFGRYKIFIIDEVHMFTTEAFNALLKTLEEPPSHVKFLLA 771747 

Query: 145 TREPERLLATLRSRCRLHYLAGPPEQYAVTWL 176 

T + +L AT+ SR + PE ++ L 

Sbjct: 771746 TTDALKLPATI LSRTQHFRFKKI PENSVI SHL 771651 



gnl|TIGR|S.aureus_2202 Staphylococcus aureus COL unfinished fragment of complete genome 
Length = 30502 

Score = 51.2 bits (120), Expect = 5e-06 

Identities = 32/136 (23%), Positives = 52/136 (37%) 

Frame = -3 

Query : 2 5 HALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKG 8 4 

HA+ GG++ ++ + C +CCC++ GT+ D + 

Sbjct: 5951 HAYIFSGPRGTGKTSIAKVFAKAINCLNSTDGEPCNECHICKGITQGTNSDVIEIDAASN 5772 

Query: 85 KNTLGVIDAVREVTEKLNEHARLGGAKVVWVX 144 

GVD +R + +K+ KV + EEPPA F LA 

Sbjct: 5771 N GVDEIRNIRDKVKYAPSESKYKYYIIDEVHMLTTGAFNALLKTLEEPPAHAIFILA 5601 

Query: 145 TREPERLLATLRSRCR 160 

T EP ++ T+ SR + 
Sbjct: 5600 TTEPHKIPPTIISRAQ 5553 



gnl |OUACGT| S.aureus_Contigll64 Staphylococcus aureus unfinished fragment of complete ger 
Length = 1224 



Score = 51.2 bits (120), Expect = 5e-06 

Identities = 32/136 (23%), Positives = 52/136 (37%) 

Frame = +2 

Query: 25 HALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKG 84 

HA+ GG++ +++C +CCC++ GT+ D + 

Sbjct: 740 HAYIFSGPRGTGKTSIAKVFAKAINCLNSTDGEPCNECHICKGITQGTNSDVIEIDAASN 919 

Query: 85 KNTLGVDAV^EVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXXXXXEE 144 

GVD +R + +K+ KV + EEPPA F LA 

Sbjct: 920 N GVDEIRNIRDKVKYAPSESKYKVYIIDEVHMLTTGAFNALLKTLEEPPAHAIFILA 1090 

Query: 145 TREPERLLATLRSRCR 160 

T EP + + T+ SR + 
Sbjct: 1091TTEPHKIPPTIISRAQ 1138 



gb| AE001439 | AE001439 Helicobacter pylori, strain J99 complete genome 
Length = 1643831 

Score =50.0 bits (117), Expect = le-05 

Identities = 38/152 (25%), Positives = 57/152 (37%) 

Frame = -3 

Query : 2 5 HALLIQALPGMGDDALI YALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKG 8 4 

+A L L G G + +R L+C+ C C CQ H D + G 
Sbjct: 734547 NAYLFSGLRGSGKTSSSRIFARALMCKTGPKAVPCDTCIQCQSALNNHHIDIIEM DG 734377 

Query: 85 KNTLGVTDAVIlEVTEKLNEHARLGGAKVvWXXXXXXXXXXXXXXXXXXEEPPAETW 144 

+ G+D VR + E+ G K+ + EEPP+ F LA 

Sbjct: 734376 ASNRGIDDVRNLIEQTRYKPSFGRYKIFIIDEVHMFTTEAFNALLKTLEEPPSHVKFLLA 734197 

Query: 145 TREPERLLATLRSRCRLHYLAGPPEQYAVTWL 176 

T + +L AT+ SR + PE ++ L 

Sbjct: 734196 TTDALKLPATI LSRTQHFRFKKI PENSVI SHL 734101 



gnl | TIGR|N.meningitidis_GNMAB03R Neisseria meningitidis MC58 unfinished fragment of com^ 
Length = 435 

Score =49.6 bits (116), Expect = le-05 

Identities = 32/116 (27%), Positives = 50/116 (42%) 

Frame = +1 

Query: 44 LSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKGKNTLGVDAVREVTEKLNEH 103 

L++ L C+ Q + CG C+ C + AG + D + + NT G+D +REV E 
Sbjct : 58 LAKSLNCENAQHGEPCGVCKSCTQIDAGRYVDLLEI--DAASNT-GIDNIREVLENAQYA 22 8 

Query: 104 ARLGGAKVVWXXXXXXXXXXXXXXXXXXEEPPAETWFFLATREPERLLATLRSRC 159 

G KV + + P + F LAT +P ++ T+ SRC 

Sbjct: 229 PTAGKYKVYIIDEGICFPKARSTLCSKRWKSRPNTSKFILATTDPHKVPVTVLSRC 396 



gnl|TIGR|S.pneumoniae_sp_68 Streptococcus pneumoniae unfinished fragment of complete ger 
Length = 21744 

Score = 49.2 bits (115), Expect = 2e-05 

Identities = 34/134 (25%), Positives = 51/134 (37%) 

Frame = -3 

Query: 25 HALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKG 84 



HA L G G + + + + + C G + C +C CQ + G+ D + 

Sbjct: 17440 HAYLFSGPRGTGKTSVAKIFAKAMNCPNQVGGEPCNNCYICQAVTDGSLEDVIEMDAASN 17261 



Query: 85 KNTLGVDAVREVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXXXXXEE P P AETWF F LA 144 

GVD +RE+ +K L KV + EEP F LA 

Sbjct: 17260 N GVDE I RE I RDKST YAPS LARYKVY 1 1 DEVHML STGAFNALLKTLEE PTQNVVF I LA 17090 

Query: 145 TREPERLLATLRSR 158 

T E + + AT+ SR 
Sbjct: 17089 TTELHKIPATILSR 17048 

gnl |TIGR|C. tepidum_gct_35 Chlorobium tepidum unfinished fragment of complete genome 
Length = 33899 

Score = 48.0 bits (112), Expect = 4e-05 

Identities = 37/144 (25%), Positives = 58/144 (39%), Gaps = 8/144 (5%) 
Frame = +1 

Query: 17 SYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQ PQGHKS CGHCRGCQLM 68 

S + GR H + L G+G ++ + CQ+ PQ K CG C C+ 

Sbjct: 29680 SLRMGRVGHGYIFSGLRGVGKTTAARVFAKAVNCQRMIDDPQYLKEVTEPCGVCESCRDF 29859 

Query: 69 QAGTHPDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKYVWVXXXXXXXXXXXXXX 12 8 

AG ++ + VD +R + E + + G +V + 

Sbjct: 29860 DAGAS LNISEFDAASNNSVDDIRLLRENVRYGPQKGRYRVYIIDEVHMLSTAAFNAF 30030 

Query: 129 XXXXEEPPAETWFFLATREPERLLATLRSRCR 160 

EEPP F AT E ++ AT+ SRC+ 
Sbjct: 30031 LKTLEEPPPHAIFIFATTELHKIPATIASRCQ 30126 

gnl | TIGR| t_f errooxidans_64 Thiobacillus ferrooxidans unfinished fragment of complete ger 
Length = 4609 

Score = 48.0 bits (112), Expect = 4e-05 

Identities = 29/115 (25%), Positives = 45/115 (38%) 

Frame = -3 

Query : 45 SRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKGKNTLGVDAVREVTEKLNEHA 104 

+ + L C + + CG C C+ + AG D L + VD R++ + + 

Sbjct: 4607 AKC LNC ERGVS SNPCGEC S ACRS I AAGNFVD LLEVDAASRTRVDETRDLLDNVQYAP 4437 

Query: 105 RLGGAKWWVXXXXXXXXXXXXXXXXXXEEPPAETWFFLATREPERLLATLRSRC 159 

G K + EEPP F LAT +P++L T+ SRC 

Sbjct: 443 6 TAGRYKAYLIDEVHMLSAHSFNALLKTLEEPPEHVKFLLATTDPQKLPITVLSRC 4272 

gnl [OUACGT | Ngon_Contigl96 Neisseria gonorrhoeae unfinished fragment of complete genome 
Length = 23501 

Score =47.3 bits (110), Expect = 7e-05 
Identities = 24/87 (27%), Positives = 41/87 (46%) 
Frame = -1 

Query: 90 VmVTlEVTEKLNEHARLGGAKVVWvXX^ 149 

+DAVRE+ + + + GG +V+ + EEPP + F L + + 

Sbjct: 23495 IDAWEIIDNVYLTSVRGGLRVILIHPAESMNVQAANSLLKVLEEPPPQVVFLLVSHAAD 23316 

Query: 150 RLLATLRSRCRLHYLAGPPEQYAVTWL 176 
++L T++SRCR LP A+ +L 



Sbjct : 



23315 KVLPT I KSRCRKMVL PAP SHGEALAYL 23235 



gnl |TIGR|D.radiodurans_8813 Deinococcus radiodurans unfinished fragment of complete gene 
Length = 83236 

Score = 47.3 bits (110), Expect = 7e-05 

Identities = 40/136 (29%), Positives = 59/136 (42%), Gaps = 20/136 (14%) 
Frame = -3 

Query: 23 GHHALLIQALPGMGDDALIYALSRYLLCQQPQGH--KSCGHCRGCQLMQAGTHPDYYTLA 80 

G +ALL+ +G L YA++ C P+G ++CG C C+ +QAG HPD L 

Sb j c t : 54530 GGNALLLSGPARVGKLDLAYAIAAQHNCSGPRGMYGEACGQCPSCRALQAGAHPDVLRLE 54351 

Query: 81 PEKGKNT LG VD A VR EVTEKLNE HAR LGGAKWWVXXXXXXXX 122 

P +T + + AV E + E+ +W V 

Sbjct: 54350 PRATTSTGKAARKRIIPIGAVLESRDTGREYETHVYEFLEVRPTFERRWIVAGAEYLNP 54171 

Query: 123 XXXXXXXXXXEEPPAETWFFLATREPERLLATLRSR 158 

EEPP F + +L T+ SR 

Sbjct: 54170 QAANALLKLVEEPPHRALFLFLAEDLRSVLPTIVSR 54063 



gb|AE000783 | AE000783 Borrelia burgdorferi complete genome 
Length = 910724 

Score = 47.3 bits (110), Expect = 7e-05 

Identities = 32/149 (21%), Positives = 59/149 (39%) 

Frame = +2 

Query: 12 EKLVASYQAGRGHHALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAG 71 

EL S + + +A + G+G + A +R L C+ CG C C+ + + 

Sbjct: 482678 ETLKHS I EKNKI ANAYI FSGPRGVGKTS SARAFARCLNCRNGPTVMPCGEC SNCKS I END 482857 

Query : 7 2 THPDYYTLAPEKGKNTLGVDAWEVTEKLNE 131 

+ D + G + V +R+ + E+ + + ++ + 

Sbjct: 482858 SSLDWEI DGASNTSVQDIRQIKEEIMFPPAI SKYRI YI IDEVHMLSNSAFNALLKT 483028 

Query: 132 XEEPPAETWFFLATREPERLLATLRSRCR 160 

EEPP F AT E +L T++SRC+ 
Sbjct: 483029 IEEPPNYIVFIFATTESHKLPETIKSRCQ 483115 



gnl |TIGR| t_ferrooxidans_1967 Thiobacillus ferrooxidans unfinished fragment of complete c 
Length =563 

Score = 45.7 bits (106), Expect = 2e-04 

Identities = 23/57 (40%), Positives = 30/57 (52%), Gaps = 1/57 (1%) 
Frame = -3 

Query: 44 LSRYLLCQQPQGHK-SCGHCRGCQLMQAGTHPDYYTLAPEKGKNTLGVDAVREVTEKL 100 

L + LC P CG CR C + L+ G HPD + PE GK + ++AVR E L 

Sbjct: 558 LQQVALCFAPTAQGLPCGTCRSCRLLAEGNHPDLLMITPETGKR-IAIEAVRHANEFL 388 



gnl |TIGR|gef_6250 Enterococcus faecalis unfinished fragment of complete genome 
Length = 24587 

Score = 45.3 bits (105), Expect = 3e-04 

Identities = 31/134 (23%), Positives = 48/134 (35%) 

Frame = -2 



Query: 25 HALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKG 84 

HAL GG+ + + + C+Q+CCC + G D + 

Sbjct: 5419 HAYLFTGPRGTGKTSAAKIFAKAINCKHSQDGEPCNVCETCVAITEGRLNDVIEIDAASN 5240 

Query: 85 KNTLGVDAVREVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXXXXXEEPPAETWFFLA 144 

GV+ +R++ +K KV + EEPP F LA 

Sbjct: 5239 N GVEEIRDIRDKAKYAPTQAEYKVYI IDEVHMLSTGAFNALLKTLEEPPQNVI FI LA 5069 

Query: 145 TREPERLLATLRSR 158 

T EP + + T+ SR 
Sbjct: 5068 TTEPHKIPLTIISR 5027 



emb | AJ235269 | RPXXO Rickettsia prowazekii strain Madrid E, complete genome 
Length = 1111523 

Score =42.2 bits (97), Expect = 0.003 

Identities = 29/137 (21%), Positives = 52/137 (37%), Gaps = 4/137 (2%) 
Frame = +2 



Query : 28 L I QAL PGMGDDAL I YALSRYLLCQ QPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEK 83 

L+ + G+G + + + + C + K+C C C HPD + 
Sbjct: 1091072 LLTGIRGIGKTTSARIIAKAVNCSALITENTAIKTCEKCTNCVSFNNHNHPDIIEI D 1091242 

Query: 84 GKNTLGVDAVREVTEKLNEHARLGGAKWWV^ 143 

+ +D +R + E G K+ + EEPP F 

Sbjct: 1091243 AASKTSIDDIRRIIESAEYKPLQGKHKIFIIDEVHMLSKGAFNALLKTLEEPPPHVIFIF 1091422 

Query: 144 ATREPERLLATLRSRCRLHYL 164 

AT E + + + +T+ SRC+ + L 
Sbjct: 1091423 ATTEVQKVPST I I SRCQRYDL 1091485 



gnl|OUACGT|Spyogenes_Contig260 Streptococcus pyogenes unfinished fragment of complete 
Length = 36214 

Score = 41.8 bits (96), Expect = 0.003 

Identities = 32/145 (22%), Positives = 53/145 (36%) 

Frame = +3 

Query : 1 4 LVASYQAGRGHHALL I QAL PGMGDDAL I YALSRYLLCQQPQGHKSCGHCRGCQLMQAGTH 7 3 

L + + +G+ HAL GG+ +++C +CCC++G+ 

Sbjct: 33432 LKQAVESGKISHAYLFSGPRGTGKTSAAKIFAKAMNCPNQVDGEPCNQCDICRDITNGSL 33611 

Query: 74 PDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXXXXXE 13 3 

D + GVD +R+ + +K KV + E 

Sbjct: 33612 EDVI E I DAASNN GVDEIRDIRDKSTYAPSRATYKVYIIDEVHMLSTGAFNALLKTLE 33782 

Query: 134 EPPAETWFFLATREPERLLATLRSR 158 

EP F LAT E + + AT+ SR 

Sbjct: 33783 EPTENWFILATTELHKIPATILSR 33857 



gnl |OUACGT|Ngon_Contigl66 Neisseria gonorrhoeae unfinished fragment of complete genome 
Length = 9825 

Score =41.4 bits (95), Expect = 0.004 
Identities = 16/37 (43%), Positives = 25/37 (67%) 
Frame = +1 



Query: 4 YPWLRPDFEKLVASYQAGRGHHALLIQALPGMGDDAL 40 

YPWL P + + + + + G GHHA+LI+A G+G + L 
Sbjct: 4321 YPWLMP I YHQ I AQTFDEGLGHHAVL I KADAGLGVERL 4431 



gnl | UOKNOR | S .mutans_Contig762 Streptococcus mutans unfinished fragment of complete genon 
Length = 3001 

Score =40.6 bits (93), Expect = 0.007 

Identities = 31/134 (23%), Positives = 47/134 (34%) 

Frame = -1 

Query: 25 HALLIQALPGMGDDALIYALSRYLLCQQPQGHKSCGHCRGCQLMQAGTHPDYYTLAPEKG 84 

HAL GG+ ++ + C +C+CC + G+D + 

Sbjct: 1519 HAYLFSGPRGTGKTSAAKIFAKAMNCPHQADGEPCNNCDICHDITNGSLEDVIEIDAASN 1340 

Query: 85 KNTLGVDAWEVTEKLNEHARLGGAKWW^ 144 

GVD +RE+ +K KV + EEP F LA 

Sbjct: 1339 N GVX)EIREIRDKSTYAPSRATYKVYIIDEVHMLSTGAFNALLKTLEEPTENWFILA 1169 

Query: 145 TREPERLLATLRSR 158 

T E ++ AT+ SR 
Sbjct: 1168 TTELHKIPATILSR 1127 



gnl |TTGR|C. trachomatis_ct_26 Chlamydia trachomatis MOPN unfinished fragment of complete 
Length = 2341 

Score =38.7 bits (88), Expect = 0.028 
Identities = 21/87 (24%), Positives = 35/87 (40%) 
Frame = +2 

Query : 7 3 HPDYYTLAPEKGKNTLGVDAVREVTEKLNEHARLGG^^ 132 

HPD Y + P+ + + R + + + H K+ + 

Sbjct: 905 HPDIYEYSPQGKGRLHTIETPRAIRKNIWIHPYESSYKIYIIYEADRISLDAISAFLKLL 1084 

Query: 133 EEPPAETWFFLATREPERLLATLRSRC 159 

E+PP + F L + P+RL T+RSRC 
Sbjct: 1 0 8 5EDPPYYSIFILVSALPQRLPPTIRSRC 1165 



gnl|TIGR|S.aureus_2184 Staphylococcus aureus COL unfinished fragment of complete genome 
Length = 12112 

Score =38.3 bits (87), Expect = 0.037 

Identities = 35/152 (23%), Positives = 64/152 (42%), Gaps = 6/152 (3%) 
Frame = -3 

Query: 12 EKLVASYQAGRGHHALLIQALPGMGDDA LIYALSRYLLCQQPQGHKSCGHCRGCQ 66 

++L +Y + + HA L + GDDA + + + +LCQ C+ 
Sbjct: 974 QQLTNAYHSNKLSHAYLFE GDDAQTMKQVAINFAKLILCQTDXQ CE 837 

Query: 67 L-MQAGTHPDYYTLAPEKGKNTLGVDAWEVTEKLNEHARLGGAKVVWVXXXXXXXXXXX 12 5 

+ HPD+ ++ + N + + V ++ +N+ KV + 

Sbjct: 836 XKVSTYNHPDFMYISTTE--NAIKKEQVEQLVIIHMNQLPIESTNKVYIIEDFEKLTVQGE 663 

Query: 12 6 XXXXXXXEEPPAETWFFLATREPERLLATLRSRCRLHY 163 

EEPP T L + +PE++L T+ SRC+ Y 
Sbjct: 662 NSILKFLEEPPDNTIAILLSTKPEQILDTIHSRCQHVY 549 



gnl |CBCUMN| Pmultocida. 990513 .Contig705 Pasteurella multocida PM70 unfinished fragment of 
Length = 3829 

Score = 3 6.0 bits (81), Expect =0.19 

Identities = 22/81 (27%), Positives = 33/81 (40%) 

Frame = +1 

Query: 90 VmVT*EVTEKLNEHARLGGAKWW^ 149 

V+ RE+ + + G KV + EEPP F LAT + P+ 

Sbjct : 58 VEDTRELLDWQYKPVQGRYKVYLIDEVHMLSRHSFNALLKTLEEPPEYVKFLLATTDPQ 237 

Query: 150 RLLATLRSRCRLHYLAGPPEQ 17 0 

+L T+ SRC +L +Q 
Sbjct: 238 KLPITILSRCMQFHLKALEQQ 300 



gb | ABO 013 3 9 | SYNECHO Synechocystis PCC6803 complete genome 
Length = 3573470 

Score = 35.6 bits (80), Expect = 0.25 

Identities = 15/28 (53%), Positives = 20/28 (70%) 

Frame = -1 

Query: 133 EEPPAETWFFLATREPERLLATLRSRCR 160 

EEPP F LAT +P+R+L T+ SRC+ 
Sbjct: 1067285 EEPPERWFVLATTDPQRVLPTIISRCQ 1067202 



gnl | TIGR|M.avium_5418 Mycobacterium avium unfinished fragment of complete genome 
Length = 17971 

Score =32.5 bits (72), Expect =2.1 

Identities = 14/28 (50%), Positives = 17/28 (60%) 

Frame = -3 

Query: 2 9 IQALPGMGDDALIYALSRYLLCQQPQGH 56 

+ LP +GDDA+ R LL QQP GH 

Sbjct: 8381 VDRLPAVGDDAVHQLARRQLLTQQPDGH 8298 



gnl | TIGR| C . crescentus_gcc_2104 Caulobacter crescentus unfinished fragment of complete ge 
Length = 826 

Score =32.1 bits (71), Expect =2.8 

Identities = 35/140 (25%), Positives = 54/140 (38%), Gaps = 11/140 (7%) 
Frame = -2 

Query: 21 GRGHH AL L I Q AL PGMGDDA L I YAL S R YLLCQQ P QGHKSCGHCRGCQLMQ 69 

GR HA ++ + G+G L+R L + +G+ HCR + 
Sbjct: 822 GRIAHAFMLTGVRGVGKTTTARLLARALNYETDTVKGPSVDLTTEGY HCRS II 664 

Query : 7 0 AGTHPDYYTLAPEKGKNTLGVIDAWEVTEKLNEHARLGGAKVVWVXXXXXXXXXXXXXXX 129 

G H D L + VD +RE+ + + KV + 

Sbjct: 663 EGRHMDVLEL DAASRTKVDEMRELLDGVRYAPVEARYKVYIIDEVHMLSTAAFNALL 493 

Query: 130 XXXEEPPAETWFFLATREPERLLATLRSRCR 160 

EEPP F AT E ++ T+ SRC + 
Sbjct: 492 KTLEEPPPHAKFIFATTEIRKVPVTILSRCQ 400 



gnl |TIGR|v.cholerae_asm959 Vibrio cholerae unfinished fragment of complete genome 



Length = 15780 



Score = 30.5 bits (67), Expect = 8.3 

Identities = 13/41 (31%), Positives = 22/41 (52%) 

Frame = +3 

Query: 219 ALAYSVPSGDWYSLLAALNHEQAPARLHWLATLLMDALKRH 259 

+ +A S P+G+W + + A + W+ATL D L R+ 
Sbjct: 1191 SVALSTPNGEWGQTVKFVRRFSAQEQKEWIATLAADMLLRY 1313 



CPU time: 0.64 user sees. 1.41 sys . sees 2.05 total sees. 

Database: Unfinished Actinobacillus actinomycetemcomitans 

Posted date: Dec 30, 1998 1:59 PM 
Number of letters in database: 1,888,023 
Number of sequences in database: 537 

Database: Complete Aquifex aeolicus 

Posted date: Aug 5, 1998 9:38 AM 
Number of letters in database: 1,551,335 
Number of sequences in database: 1 

Database: Complete Bacillus subtilis 
Posted date: Aug 5, 1998 9:38 AM 
Number of letters in database: 4,214,814 
Number of sequences in database: 1 

Database: Unfinished Bordetella pertussis 

Posted date: May 3, 1999 3:37 PM 
Number of letters in database: 3,987,145 
Number of sequences in database: 543 

Database: Borrelia burgdorferi 

Posted date: Aug 5, 1998 9:38 AM 
Number of letters in database: 1,229,458 
Number of sequences in database: 12 

Database: Unfinished Campylobacter jejuni 

Posted date: Nov 17, 1998 10:56 AM 
Number of letters in database: 1,641,480 
Number of sequences in database: 1 

Database: Complete Chlamydia trachomati 

Posted date: Aug 14, 1998 4:20 PM 
Number of letters in database: 1,042,519 
Number of sequences in database: 1 

Database: Unfinished Chlorobium tepidum 

Posted date: Feb 8, 1999 10:29 AM 
Number of letters in database: 2,257,254 
Number of sequences in database: 254 

Database: Unfinished Clostridium acetobutylicum 

Posted date: Mar 31, 1999 10:56 AM 
Number of letters in database: 3,943,874 
Number of sequences in database: 1 

Database: Unfinished Caulobacter crescentus 

Posted date: Feb 8, 1999 11:17 AM 
Number of letters in database: 4,177,031 



Number of sequences in database: 3481 



Database: Unfinished Chlamydia trachomatis MOPN 

Posted date: Feb 8, 1999 11:21 AM 
Number of letters in database: 1,160,971 
Number of sequences in database: 624 

Database: Unfinished Deinococcus radiodurans 

Posted date: Feb 8, 1999 10:30 AM 
Number of letters in database: 3,615,037 
Number of sequences in database: 869 

Database: Complete Escherichia coli 

Posted date: Aug 5, 1998 9:37 AM 
Number of letters in database: 4,639,221 
Number of sequences in database: 1 

Database: Unfinished Enterococcus faecalis 

Posted date: Feb 8, 1999 10:30 AM 
Number of letters in database: 3,209,119 
Number of sequences in database: 293 

Database: Complete Haemophilus influenzae Rd 

Posted date: Aug 5, 1998 9:37 AM 
Number of letters in database: 1,830,13 8 
Number of sequences in database: 1 

Database: Complete Helicobacter pylori 26695 

Posted date: Jan 25, 1999 3:20 PM 
Number of letters in database: 1,667,867 
Number of sequences in database: 1 

Database: Complete Helicobacter pylori J99 

Posted date: Jan 25, 1999 3:55 PM 
Number of letters in database: 1,643,831 
Number of sequences in database: 1 

Database: Unfinished Mycobacterium avium 

Posted date: May 17, 1999 1:55 PM 
Number of letters in database: 5,354,737 
Number of sequences in database: 692 

Database: Unfinished Mycobacterium bovis 

Posted date: May 10, 1999 1:17 PM 
Number of letters in database: 4,093,505 
Number of sequences in database: 931 

Database: Complete Mycoplasma pneumoniae 

Posted date: Aug 5, 1998 9:37 AM 
Number of letters in database: 816,394 
Number of sequences in database: 1 

Database: Unfinished Mycobacterium tuberculosis CSU#93 

Posted date: Feb 8, 1999 10:30 AM 
Number of letters in database: 4,306,088 
Number of sequences in database: 42 

Database: Complete Mycobacterium tuberculosis H37Rv 

Posted date: Aug 14, 1998 4:20 PM 
Number of letters in database: 4,411,529 
Number of sequences in database: 1 



Database: Complete Mycoplasma genitalium 

Posted date: Aug 5, 1998 9:36 AM 
Number of letters in database: 580,073 
Number of sequences in database: 1 

Database: Unfinished Neisseria gonorrhoea 

Posted date: Dec 30, 1998 2:00 PM 
Number of letters in database: 2,172,011 
Number of sequences in database: 159 

Database: Unfinished Neisseria meningitidis MC58 

Posted date: Feb 8, 1999 10:30 AM 
Number of letters in database: 1,406,901 
Number of sequences in database: 2533 

Database: Unfinished Neisseria meningitidis serogroup A 

Posted date: May 3, 1999 3:38 PM 
Number of letters in database: 2,166,687 
Number of sequences in database: 25 

Database: Unfinished Pseudomonoas aeruginosa 

Posted date: Mar 15, 1999 3:11 PM 
Number of letters in database: 6,246,116 
Number of sequences in database: 12 

Database: Unfinished Porphyromonas gingivalis W83 

Posted date: May 17, 1999 1:55 PM 
Number of letters in database: 2,334,787 
Number of sequences in database: 12 

Database: Unfinished Pasteurella multocida PM70 

Posted date: May 14, 1999 2:09 PM 
Number of letters in database: 4,166,549 
Number of sequences in database: 3506 

Database: Unfinished Pseudomonas putida 

Posted date: May 10, 1999 3:21 PM 
Number of letters in database: 201,388 
Number of sequences in database: 391 

Database: Complete Rickettsia prowazekii 

Posted date: Nov 16, 1998 3:20 PM 
Number of letters in database: 1,111,523 
Number of sequences in database: 1 

Database: Unfinished Staphylococcus aureus COL 

Posted date: May 6, 1999 2:33 PM 
Number of letters in database: 3,071,880 
Number of sequences in database: 2177 

Database: Unfinished Staphylococcus aureus 

Posted date: Dec 30, 1998 2:00 PM 
Number of letters in database: 733,437 
Number of sequences in database: 506 

Database: Unfinished Streptococcus mutans 

Posted date: Dec 30, 1998 2:00 PM 
Number of letters in database: 1,438,835 
Number of sequences in database: 514 



Database: Unfinished Shewanella putrefaciens 

Posted date: Feb 8, 1999 11:22 AM 
Number of letters in database: 5,974,789 
Number of sequences in database: 2430 

Database: Unfinished Streptococcus pyogenes 

Posted date: Dec 30, 1998 2:00 PM 
Number of letters in database: 1,801,145 
Number of sequences in database: 181 

Database: Unfinished Streptococcus pneumoniae 

Posted date: Feb 8, 1999 10:31 AM 
Number of letters in database: 2,114,666 
Number of sequences in database: 270 

Database: Unfinished Salmonella typhi 

Posted date: May 3, 1999 3:38 PM 
Number of letters in database: 5,088,553 
Number of sequences in database: 185 

Database : Complete Synechocystis PCC6803 

Posted date: Aug 5, 1998 9:36 AM 
Number of letters in database: 3,573,470 
Number of sequences in database: 1 

Database: Unfinished Thiobacillus ferrooxidans 

Posted date: May 10, 1999 3:22 PM 
Number of letters in database: 3,488,401 
Number of sequences in database: 2870 

Database: Unfinished Thermotoga maritima 

Posted date: Feb 8, 1999 10:31 AM 
Number of letters in database: 2,352,161 
Number of sequences in database: 948 

Database: Complete Treponema pallidum 
Posted date: Aug 14, 1998 4:21 PM 
Number of letters in database: 1,13 8,011 
Number of sequences in database: 1 

Database: Unfinished Vibrio cholerae 
Posted date: Feb 8, 1999 10:31 AM 
Number of letters in database: 4,145,671 
Number of sequences in database: 694 

Database: Unfinished Yersinia pestis 
Posted date: May 3, 1999 3:38 PM 
Number of letters in database: 4,937,945 
Number of sequences in database: 209 



Lambda K H 

0.322 0.137 0.00 

Gapped 

Lambda K H 

0.270 0.0470 4.94e-324 



Matrix: BLOSUM62 

Gap Penalties: Existence: 11, Extension: 1 
Number of Hits to DB : 58759890 



Number of Sequences: 537 
Number of extensions: 944282 
Number of successful extensions: 4967 
Number of sequences better than 10.0: 144 
Number of HSP's better than 10.0 without gapping: 69 
Number of HSP's successfully gapped in prelim test: 7 
Number of HSP's that attempted gapping in prelim test: 4565 
Number of HSP's gapped (non-prelim): 857 
length of query: 334 
length of database: 40,975,456 
effective HSP length: 48 
effective length of query: 286 
effective length of database: 39731536 
effective search space: 11363219296 
effective search space used: 11363219296 
frameshift window, decay const: 50, 0.1 
T: 13 
A: 40 
XI 
X2 



X3 
SI 
S2 



16 ( 7.4 bits) 

38 (14.8 bits) 

64 (24.9 bits) 

41 (21.9 bits) 

66 (30.1 bits) 
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WARNING : These microbial genomes from are not yet finished, and are not 
yet in GenBank and are not presently distributed to EMBL or DDBJ . 
Please see details 



NOTE: This WWW-BLAST page utilizes NCBI ' s new gapped BLAST algorithm 

( Altschul et al., 1997 ) with the BLASTN, TBLASTN, and TBLASTX programs. 



Commencing search, please wait for results* 



You have searched a database generously provided by the Institute for Genomic Research 
(TIGR). Their Policy on Early Data Release is: 

The Institute for Genomic Research (TIGR) releases data very rapidly to ensure that our scientific colleagues have access to 
information that may assist them in the search for genes and their biological function. Data releases do not constitute scientific 
publication, but rather provide investigators with information that may "jump-start" biological experimentation. Users of this 
information are encouraged to share their results with TIGR in order to improve annotation of the sequence data. Data or 
information may contain errors or be incomplete and should be regarded as preliminary. 

TIGR asks that you acknowledge the source of information obtained from this site in any publication by including the following 
sentence in both the Materials and Methods and Acknowledgement sections: "Preliminary sequence data was obtained from The 
Institute for Genomic Research website at http://www.tigr.org " Also include the following text in the Acknowledgements, if 
applicable: "Sequencing of [organism name] was accomplished with support from [funding agency]." The name of the funding 
agency for each TIGR project can be found at http://www.tigr.org/tdb/mdb/mdb.html 

Similarly, if you display this data or any information derived from it on a Web page, we ask that you prominently display the 
following notice on that webpage: "Preliminary sequence data was obtained from The Institute for Genomic Research website at 
http://ww w.tigr.org " We request that you notify us of your electronic presentation by sending email to www@tigr.org. 



TBLASTN 2.0.8 [ Jan-05-1999] 



Reference t 

Altschul, Stephen F . , Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search 
programs", Nucleic Acids Res. 25:3389-3402. 

Query= 

(343 letters) 

Searching done 

If you have any problems or questions with the results of this search 
please refer to the BLAST FAQ a 



Sequences producing significant alignments: 



Score E 
(bits) Value 



gb|U00096|ECOLI Escherichia coli K-12 MG1655 complete genome 
gnl | Sanger | S . typhi_Contig403 Salmonella typhi unfinished f ragmen, 
gnl (Sanger j Y.pesits — Contig765 Yersinia pestis unfinished fragmen. 
gnl |TIGR| V. cholerae_asm937 Vibrio cholerae unfinished fragment o. 
gb|L42023 (L42023 Haemophilus influenzae Rd complete genome 
gnl |OUACGT|A.actin_ Contig739 Actinobacillus actinomycetemcomitan . 
gnl |TIGR| S .putref aciens_gsp„230 Shewanella putrefaciens unfinish. 
gnl j PAGP I Paeruginosa_Contig52 Pseudomonas aeruginosa unfinished . 
gnl |TIGR| t_f errooxidans_626 Thiobacillus ferrooxidans unfinished, 
gnl j Sanger |B.pertussis_Contig669 Bordetella pertussis unfinished, 
gnl j Sanger |N.mening_Contig363 Neisseria meningitidis serogroup A. 
gnl |OUACGT|Ngon_Contig213 Neisseria gonorrhoeae unfinished fragm. 
gnl j TIGR| D. radiodurans_8857 Deinococcus radiodurans unfinished f. 
gnl j PAGP j Paeruginosa_Contig44 Pseudomonas aeruginosa unfinished . 
gnl j Sanger_1765 | mbovis_Contig976 . 0 Mycobacterium bovis unf inishe. 
gnl |TIGR| gmt3661 Mycobacterium tuberculosis unfinished fragment . 
embj AL123456 [MTBH37RV Mycobacterium tuberculosis H37Rv complete . 

gb|U00096 | ECOLI Escherichia coli K-12 MG1655 complete genome 
Length = 4639221 

Score = 619 bits (1578), Expect = e-177 

Identities = 312/343 (90%), Positives = 312/343 (90%) 

Frame = -3 
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e-160 


447 


e-125 


282 


le-75 


237 


3e-62 
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3e-58 
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3e-49 


139 


le-32 
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le-28 


122 


2e-27 


115 


2e-25 


109 


le-23 


^8 


0.064 


-11 


8.2 




8.2 


~31 


8.2 




8.2 



Query: 1 MIRLYPEQLRAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFSIDPNTD 60 

MIRLYPEQLRAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFSIDPNTD 
Sbjct: 670828 MIRLYPEQLRAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFSIDPNTD 670649 

Query: 61 WNAIFSLCQAMSLFASRQTLLLLLPENGPNAAINEQXXXXXXXXXXXXXXIVRGNKLSKA 120 

WNAIFSLCQAMSLFASRQTLLLLLPENGPNAAINEQ IVRGNKLSKA 
Sbjct: 670648 WNAIFSLCQAMSLFASRQTLLLLLPENGPNAAINEQLLTLTGLLHDDLLLIVRGNKLSKA 670469 

Query : 121 QENAAWFTALANRSVQVTCQTPEQAQLPRWAARAKQLNLELDDAANQVLCYCYEGNLLA 180 

QENAAWFTALANRSVQVTCQTPEQAQLPRWVAARAKQLNLELDDAANQVLCYCYEGNLLA 
Sbjct: 670468 QENAAWFTALANRSVQVTCQTPEQAQLPRWAARAKQLNLELDDAANQVLCYCYEGNLLA 670289 

Query: 181 LAQALERLSLLWPTCKLTLPRVEQAVNDAAHFTPFHWVDALLMGKSKRALHILQQLRLEG 240 

LAQALERLSLLWPDGKLTLPRVEQAVNDAAHFTPFHWVDALLMGKSKRALHILQQLRLEG 
Sbjct: 670288 LAQALERLSLLWPDGKLTLPRVEQAWDAAHFTPFHVA^DALLMGKSKRALHILQQLRLEG 670109 

Query: 241 SEPVIXXXXXXXXXXXXXXXXXQSAHTPLRALFDKHRWQNRRGMMGEALNRLSQTQL^ 300 

SEPVI QSAHTPLRALFDKHRVWQNRRGMMGEALNRLSQTQLRQ 
Sbjct: 670108 SEPVILLRTLQRELLLLVNLKRQSAHTPLRALFDKHRVWQNRRGMMGEALNRLSQTQLRQ 669929 

Query: 301 AVQLLTRTELTLKQDYGQSVWAELEGLSLLLCHKPLADVFIDG 343 

AVQLLTRTELTLKQDYGQSVWAELEGLSLLLCHKPLADVFIDG 
Sbjct: 669928 AVQLLTRTELTLKQDYGQSVWAELEGL S LLLCHKPLADVF I DG 669800 

gnl | Sanger |S.typhi_Contig403 Salmonella typhi unfinished fragment of complete genome 
Length = 36914 

Score = 563 bits (1436), Expect = e-160 

Identities = 279/343 (81%), Positives = 298/343 (86%) 

Frame = -3 



Query: 1 MIRLYPEQLRAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFSIDPNTD 60 

MIRLYPEQLRAQLNE LRAAYLLLGNDPLLLQESQDA+R AA+QGFEEHH F++DP+TD 
Sbjct: 15489 MIRLYPEQLRAQLNEXLRAAYLLLGNDPLLLQESQDAIRLAAASQGFEEHHAFTLDPSTD 15310 



Query: 61 WNAIFSLCQAMSLFASRQTLLLLLPENGPNAAINEQXXXXXXXXXXXXXXIVRGNKLSKA 120 

W ++FSLCQAMSLFASRQTL+L LPENGPNAA+NEQ IVRGNKL+KA 
Sbjct: 15309 WGSLFSLCQAMSLFASRQTLVLQLPENGPNAAMNEQIiATLSELLHDDLLLIVRGNKLTKA 15130 

Query : 121 QENAAWFTALANRSVQVTCQTPEQAQLPRWVAARAKQLNLELDDAANQVLC YC YEGNLLA 180 

QENAAW+TALA+RSVQV+CQTPEQAQLPRWVAARAK NL + LDDAANQ+ LCYC YEGNLLA 
Sbjct: 1512 9 QENAAWYTALADRSVQVSCQTPEQAQLPRWVAARAKAQNLQLDDAANQLLCYCYEGNLLA 149 50 

Query: 181 LAQALERLSLLWPDGKLTLPRVEQAVNDAAHFTPFHWVDALLMGKSKRALHILQQLRLEG 240 

LAQALERLSLLWPDGKLTLPRVEQAVNDAAHFTPFHWVDALLMGKSKRALHILQQLRLEG 
Sbjct: 14949 LAQALERLSLLWPDGKLTLPRVEQAVNDAAHFTPFHWVDALLMGKSKRALHILQQLRLEG 14770 

Query: 241 SEPVIXXXXXXXXXXXXXXXXXQSAHTPLRALFDKHRVWQNRRGMMGEALNRLSQTQLRQ 3 00 

SEPVI QSAHTPLRALFDKHRVWQNRR M+G+AL RL QLRQ 

Sbjct: 14769 SEPVILLRTLQRELLLLVNLKRQSAHTPLRALFDKHRVWQNRRPMIGDALQRLHPAQLRQ 14590 

Query: 301 AVQLLTRTELTLKQDYGQSVWAELEGLSLLLCHKPLADVFIDG 343 

AVQLLTRTE+TLKQDYGQSVWA+LEGLSLLLCHK LADVFIDG 
Sbjct: 14589 AVQLLTRTEITLKQDYGQSVWADLEGLSLLLCHKALADVFIDG 14461 



gnl | Sanger | Y . pesits_Contig765 Yersinia pestis unfinished fragment of complete genome 
Length = 215860 

Score = 447 bits (1138), Expect = e-125 

Identities = 223/342 (65%), Positives = 263/342 (76%) 

Frame = +3 

Query: 1 MIRLYPEQLRAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFSIDPNTD 60 

MIR+YPEQL AQL+EGLRA YLL GN+PLLLQESQD +R+VA+ F EH +F++D +T+ 
Sbjct: 50067 MIRIYPEQLVAQLHEGLRACYLLCGNEPLLLQESQDHIRRVASQHDFTEHFSFALDAHTE 50246 

Query: 61 WNAIFSLCQAMSLFASRQTLLLLLPENGPNAAINEQXXXXXXXXXXXXXXIVRGNKLSKA 120 

W IFSLCQA+SLFASRQTLLL P++G A I+EQ I+R NKL+KA 

Sbjct: 50247 WEHIFSLCQALSLFASRQTLLLSFPDSGLTAPISEQLVKLSGLLHPDILLILRANKLTKA 50426 

Query : 121 QENAAWFTALANRSVQVTCQTPEQAQLPRWVAARAKQLNLELDDAANQVLC YC YEGNLLA 180 

QEN+AWF AL+ V V+CQTPEQAQLPRWV+ARAK LNL +DDAA Q+LCYCYEGNLLA 
Sbjct: 50427 Q ENS AWFKALSKNGVFVSCQTPEQAQLPRWVS ARAKS LNLNVDDAAIQLLCYC YEGNLLA 50606 

Query: 181 LAQALERLSLLWPDGKLTLPRVEQAVNDAAHFTPFHWVDALLMGKSKRALHILQQLRLEG 240 

L+QALERLSLL+PDGKLTLP+VEQAVNDAAHFTP+HW+DALLMGKSKRA HILQQL+ E 
Sbjct: 50607 LSQALERLSLLYPDGKLTLPKVEQAVNDAAHFTPYHWLDALLMGKSKRAWHILQQLQQED 50786 

Query: 241 SEPVIXXXXXXXXXXXXXXXXXQSAHTPLRALFDKHRVWQNRRGMMGEALNRLSQTQLRQ 3 00 

SEPVI Q PLRALFD+H+ +WQNRR MM +AL RLS QL+Q 

Sbjct: 50787 SEPVILLRTVQRELLLLLALKRQMEQVPLRALFDQHKIWQNRRPMMTQALQRLSLQQLQQ 50966 

Query: 301 AVQLLTRTELTLKQDYGQSVWAELEGLSLLLCHKPLADVFID 342 

AV LLT+ E+ LKQDYGQS+W ELE LS+L+C K L + F D 
Sbjct: 50967 AVHLLTQMEIRLKQDYGQSIWPELETLSMLMCGKTLPESFFD 51092 



gnl |TIGR| V.cholerae_asm937 Vibrio cholerae unfinished fragment of complete genome 
Length = 6994 

Score = 282 bits (714), Expect = le-75 

Identities = 151/332 (45%), Positives = 207/332 (61%), Gaps = 1/332 (0%) 
Frame = +2 



Query: 2 IRLYPEQLRAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFSIDPNTDW 61 



+R+Y E+L L+ + L YL+ GN+PLLLQE++ A+ + A AQGF E H FS D DW 
Sbjct: 1166 MRIYAEKLAESLHKTLYPIYLVFGNEPLLLQEAKTAIEKTAQAQGFLEKHRFSADAGLDW 1345 

Query: 62 NAIFSLCQAMSLFASRQTLLLLLPENGPNAAINEQXXXXXXXXXXXXXXIVRGNKLSKAQ 121 

NA++ CQA+SLF+SRQ + + +PE+G NA ++ +V G KL+KAQ 

Sbjct: 1346 NAVYDCCQALSLFSSRQLIEIEIPESGVNAQTAKELSALVGQLHQDILLLVIGPKLTKAQ 1525 

Query : 122 ENAAWFTALANRSVQVTCQTPEQAQLPRWVAARAKQLNLELDDAANQVLCYCYEGNLLAL 181 

ENAAWF LA ++ V C TPE ++LP++V R L L+ D A Q+L +EGNL AL 
Sbjct: 1526 ENAAWFKTLAQQACWVNCLTPELSRLPQFVQQRCFALGLKPDAEAVQMLAQWHEGNLFAL 1705 

Query: 182 AQALERLSLLWPDGKLTLPRVEQAVNDAAHFTPFHWVDALLMGKSKRALHILQQLRLEGS 241 

AQ+LE+L+LL+PDG LTL R+E++++ HFTP+HW+DALL GK+ RA IL+QL LE S 
Sbjct: 1706 AQSLEKLALLYPIXjLLTLWLEESLSRJINHFTPYHWMDALLEGKANRAQRILRQLMLEES 1885 

Query: 242 EPVIXXXXXXXXXXXXXXXXXQSAHT-PLRALFDKHRVWQNRRGMMGEALNRLSQTQLRQ 300 

EP+I + L + LFD+ +RVWQNRR + AL RL L + 

Sbjct: 1886 EPIILIRTAQKELTQLLKWQQERQQLGNLGSLFDRYRVWQNRRPLYSAALQRLPSRALLR 2065 



Query: 301 AVQLLTRTELTLKQDYGQSVWAELEGLSLLLCH 333 

V +LT+ EL K Y Q VW L+ LSL C+ 
Sbjct: 2066 LVG I LTQAELLAKTQYEQPVWPI LQQLS LECCN 2164 



gb| L42023 | L42023 Haemophilus influenzae Rd complete genome 
Length = 1830138 

Score = 237 bits (599), Expect = 3e-62 

Identities = 133/332 (40%), Positives = 182/332 (54%), Gaps = 12/332 (3%) 
Frame = +3 



Query: 1 MIRLYPEQLRAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFSIDPNTD 60 

M R+ + PEQL L +GL YLL G DPLLL E++D + QVA QGF+E +T +D TD 
Sbjct: 980328 MNRIFPEQLNHHLAQGLARVYLLQGQDPLLLSETEDTICQVANLQGFDEKNTIQVDSQTD 980507 

Query: 61 WNAIFSLCQAMSLFASRQTLLLLLPENGPNAAINEQXXXXXXXXXXXXXXIVRGNKLSKA 120 

W + CQ++ LF S+Q L L LPEN A + + I++ KL+K 

Sbjct: 980508 WAQLIESCQSIGLFFSKQILSLNLPENF-TALLQKNLQELISVLHKDVLLILQVAKLAKG 980684 

Query: 121 Q ENAAWF TALAN RSVQVTCQTPEQAQLPRWVAARAKQLNLELDDAANQVLCYCYEGN 177 

E WF L ++ + CQTP LPRWV R K + L+ D+ A Q LCY YE N 

Sbjct : 980685 IEKQTWFITLNQYEPNTILINCQTPTVENLPRWVKNRTKAMGLDADNEAIQQLCYSYENN 980864 

Query: 178 LLAI^QALERLSLLWPIXSKLTLPRVEQAWDAAHFTPFHWVXJALLMGKSKRALHILQQLR 237 

LLAL QAL+ L LL+PD KL RV V ++ FTPF W+DALL+GK+ RA IL+ L+ 
Sbjct: 980865 LLALKQALQLLDLLYPDHKLNYNRVI SWEQS S IFTPFQWIDALLVGKANRAKRI LKGLQ 981044 

Query: 238 LEGSEPVIXXXXXXXXXXXXXXXXXQSAH TPLRALFDKHRVWQNRRGMMGE 288 

E +PVI ++ FD+ ++WQNRR + 

Sbjct : 981045 AEDVQPVILLRTLQRELFTLLELTKPQQRIVTTEKLPIQQIKTEFDRLKIWQNRRPLFLS 981224 

Query: 289 ALNRLSQTQLRQAVQLLTRTELTLKQDYGQSVWAELEGLSLLLC 332 

A+ RL+ L + +Q L E KQ++ VW +L LS+ +C 
Sbjct: 981225 AIQRLT YQTLYEI I QELANI ERLAKQEFSDEVWI KLADL S VKI C 981356 

gnl |OUACGT| A.actin_Contig739 Actinobacillus actinomycetemcomitans unfinished fragment of 
genome 

Length = 4889 



Score = 210 bits (529), Expect(2) = 3e-58 



Identities = 124/297 (41%), Positives = 165/297 (54%), Gaps = 12/297 (4%) 
Frame = - 3 

Query: 33 ESQDAVRQVAAAQGFEEHHTFSIDPNTDWNAIFSLCQAMSLFASRQTLLLLLPENGPNAA 92 

ES + + Q A +GF+E 1+ +TDWN +F Q+M LF ++Q ++L LPEN A 

Sbjct: 2601 ESANGIYQTALQRGFDEKVELDINASTDWNDLFEPVQSMGLFFNKQLIILDLPENA-TAL 2425 

Query: 93 I NEQXXXXXXXXXXXXXX I VRGNKL S KAQENAAWFTALAN RSVQVTCQTPEQAQLPR 149 

+ + I R KL+KA E AWF A + +V V CQTP QLPR 

Sbjct: 2424 LQKNLSEFISLLQPDVLPIFRLAKLTKAAEKQAWFMAANQYEPQAVLVNCQTPNAEQLPR 2245 

Query: 150 WVAARAKQLNLELDDAANQVLCYCYEGNLLALAQALERLSLLWPDGKLTLPRVEQAVNDA 209 

WVA RAK L L ++ A Q+LCY YE NLLAL Q L+ L LL+PD KLT RV V + 
Sbjct: 2244 WVANRAKMLGLSIEQEAVQLLCYSYENNLLALKQTLQLLDLLYPDRKLTFARVNSWEQS 2065 

Query: 210 AHFTPFHWVDALLMGKSKRALHILQQLRLEGSEPVIXXXXXXXXXXXXXXXXX QSA 265 

+ FTPF WVDA+L GK RA IL L+ E + P+I QS 
Sbjct: 2064 SVFTPFQWVDAILGGKGNRARRILTGLKDEDVQPIILLRTLQRDLMTLLEISKPEQPQSL 1885 

Query: 266 HTP LRALFDKHRVWQNRRGMMGEALNRLSQTQLRQAVQLLTRTELTLKQDYGQSV 320 

+ P LR FD+ +VWQNRR + +A+ RL+ +L Q L E KQ++ + 

Sbjct: 1884 DSPLPTDQLREQFDRLKVWQNRRSLFTQAVQRLTYRKLYLFFQQLADVERCAKQEFSDDI 1705 

Query: 321 WAELEGLSL 329 

W +LE LS+ 
Sbjct: 1704 WQQLEDLSV 1678 



Score =36.0 bits (81), Expect (2) = 3e-58 
Identities = 17/31 (54%), Positives = 20/31 (63%) 
Frame - -2 

Query: 1 MIRLYPEQLRAQLNEGLRAAYLLLGNDPLLL 31 

M RL+PEQL + L L Y L+G DPLLL 
Sbjct: 2698 MNRLFPEQLASSLERHLAHVYFLVGEDPLLL 2606 



gnl|TIGR|S.putrefaciens_gsp_230 Shewanella putrefaciens unfinished fragment of complete 
Length =21837 

Score = 194 bits (489), Expect = 3e-49 

Identities = 121/341 (35%), Positives = 167/341 (48%), Gaps = 4/341 (1%) 
Frame = +2 

Query: 2 IRLYPEQLRAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFSIDPNTDW 61 

+R+YP+QL LN LA YL+ G+DP LL+ S+D +RQ A QGFEE + +W 

Sbjct: 14210 MRWPDQLSRHLNP-LHACYLIFGDDPWLLETSKDQIRQAAKRQGFEERVQLIQETGFNW 14386 

Query: 62 NAIFSLCQAMSLFASRQTLLLLLPENGPNAAINEQXXXXXXXXXXXXXXI VRGNKL SKAQ 121 

+ QAMSLF+SR+ + L LP PA + 1+ G KL+ Q 

Sbjct: 14387 GDLTQEWQAMSLFSSRRIIELTLPSAKPGADGSAALQSLLQTPSPDVLLILEGPKLASEQ 14566 

Query: 122 ENAAWFTALANRSVQVTCQTPEQAQLPRWVAARAKQLNLELDDAANQVLCYCYEGNLLAL 181 

N+ WF L + + + C TPE Q RW+ +R L L A +L YEGNLLA 

Sbjct: 14567 TNSKWFKTLDSLGIYLPCTTPEGDQFRRWLDSRIAHFKLNLQPDARAMLYSLYEGNLLAA 14746 

Query: 182 AQALERLSLLWPDGKLTLPRVEQAVNDAAHFTPFHWVDALLMGKSKRALHILQQLRLEGS 241 

QA++ LLLP + + D + FTF DALL + A H+L QL EG+ 

Sbjct: 14747 DQAMQLLQLLSPSKPIGADELSHYFEDQSRFTVFQLTDALLNNRQDSAQHMLAQLNGEGT 1492 6 



Query: 242 E-PVIXXXXXXXXXXXXXXXXXQSAHTPLRALFDKHRVWQNRRGMMGEALNRLSQTQLRQ 300 



P++ Q+ +PL +LF KHR+W R+ + AL RLS Q+ 

Sbjct: 14927 AMPILLWALFKELQLLLSLKSEQAQGSPLNSLFGKHRIWDKRKPLYQTALQRLSLAQIEH 15106 



Query: 301 AVQLLTRTELTLKQDYGQSVWAELEGLSLLL CHKPLADVFID 342 

+ + + EL LKQ G WLLLL HLA + +D 

Sbjct: 15107 MLAFASKLELNLKQ-LGHEDWTGLSHLCLLFDPKAHSHLAHINLD 15238 



gnl | PAGP| Paeruginosa_Contig52 Pseudomonas aeruginosa unfinished fragment of complete ger. 
Length =872680 

Score = 139 bits (347), Expect = le-32 

Identities = 106/329 (32%), Positives = 155/329 (46%), Gaps = 8/329 (2%) 
Frame = - 2 

Query: 2 I RLYPEQLRAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFS IDPNTDW 61 

++L P QL L L Y++ G++PLL QE+ DA+RQ + F E F+ + N DW 
Sbjct: 245226 MKLTPAQLAKHLQGPLAPVYWSGDEPLLCQEACDAIRQACRERDFGERQVFNAEANFDW 245047 

Query: 62 NAIFSLCQAMSLFASRQTLLLLLPENGPN AAINEQXXXXXXXXXXXXXXIVRGNKLS 118 

+ ++SLFA ++ + L LP P AAI ++ + KL 
Sbjct: 245046 GLLLEAGASLSLFAEKRLIELRLPSGKPGDKGAAILQEYLQRPPEDTVLLLGLP KLD 244876 

Query : 119 KAQENAAWFTAL — ANRSVQVTCQTPEQAQLPRWVAARAKQLNLELDDAANQVLCYCYEG 17 6 

+ + W AL N++ + QLP+W+ R Q L A + + + EG 

Sbjct: 244875 GSTQKTKWAKALIDGNAAQFIQVWPVDVHQLPQWIRQRLSQAGLSASPEALELIAARVEG 244696 

Query: 177 NLLALAQALERLSLLWPDGKLTLPRVEQAVTJDAAHFTPFHWVDALLMGKSKRALHILQQL 23 6 

NLLA AQ +E+L LL ++ V+ AV D+A F F +DA L G++ AL IL+ L 
Sbjct: 244695 NLLAAAQEIEKLKLLAEGNQIDAATVQAAVADSARFDVFGLIDAALGGEAAHALRILEGL 244516 

Query: 237 RLEGSE- PVIXXXXXXXXXXXXXXXXXQSAHTPLRALFDKHR- -VWQNRRGMMGEALNRL 293 

R EG E PVI PL F + R VW RR ++ AL R 

Sbjct: 244515 RGEGIEPPVILWGLAREIRLLAGLSQQYGQGIPLEKAFAQARPPVWDKRRPLLTRALQRH 244336 

Query: 294 SQTQLRQAVQLLTRTELTLKQDYGQSVWAELEGLSLL 330 

S ++ Q+L +L Q GQ+ + GLSLL 

Sbjct: 244335 SSSRWN QMLRDAQLIDAQIKGQAPGSPWSGLSLL 244234 

gnl |TIGR| t_ferrooxidans_626 Thiobacillus ferrooxidans unfinished fragment of complete ge 
Length = 1632 

Score = 126 bits (313), Expect = le-28 

Identities = 84/331 (25%), Positives = 148/331 (44%) 

Frame = -2 

Query: 2 IRLYPEQLRAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFSIDPNTDW 61 

+RL P + L L + Y + ++PLLLQE++DA+ AA GF + + W 
Sbjct: 1184 MRLKPAHWASHLRGPLASVYGIFSDEPLLLQEAEDALMAAAAQHGFAQKQRLAQQDGGIW 1005 

Query: 62 NAIFSLCQAMSLFASRQTLLLLLPENGPNAAINEQXXXXXXXXXXXXXXIVRGNKLSKAQ 121 

+A+ A SLFA+++ LLL L + + + Q + + 

Sbjct: 1004 DALRDERDAGSLFAAQRVLLLRLDSPKVPKEASAALQYWLASPPPDALLVLSGPRPDASI 82 5 

Query: 122 ENAAWFTALANRSVQVTCQTPEQAQLPRWVAARAKQLNLELDDAANQVLCYCYEGNLLAL 181 

+ AWF + + PE PRWV R + ++ D AA Q+L GNL A 

Sbjct: 824 QKTAWFKG I ETHGHTLLLYRPEGQDWPRWVEQRLRAAGMQADSAAVQLLTDLSAGNLGAC 645 



Query: 182 AQALERLSLLWPIX3KLTLPRVEQAVOTAAHFTPFHWVDALLMGKSKRALHILQQLRLEGS 241 
QA++RL + + P ++ + + D++ FT + DA+L G+ + + LH+L +LR 



Sbjct: 644 HQAIQRLQQVYPGQRIDAVAIRAVLADSSQFTIYDLADAVLRGETEHMLHMLDRLRNGDG 465 



Query: 242 EPVIXXXXXXXXXXXXXXXXXQSAHTPLRALFDKHRVWQNRRGMMGEALNRLSQTQLRQA 301 

EP + ++ + A F ++R++ R+G + A RL+++ L+ 

Sbjct: 464 E PAL - - CLWVLHKDLRLLAELRAGGVDVDAFFRQNRI F PPRQGWLRTAARRLTRSGLQXG 291 

Query: 302 VQLLTRTELTLKQDYGQSVWAELEGLSLLLC 332 

++ + +K VW L L L +C 

Sbjct: 290 IKDCLAIDARIKGQDPTPVWPALTDLCLRMC 198 

gnl | Sanger | B ,pertussis_Contig669 Bordetella pertussis unfinished fragment of complete 
Length = 24999 

Score = 122 bits (302), Expect = 2e-27 

Identities = 89/313 (28%), Positives = 142/313 (44%), Gaps = 12/313 (3%) 
Frame = - 1 

Query: 17 LRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFS IDPNTDWNAI FSLCQAMSLFAS 76 

L Y + G+ + PLL+ E+ DA+R A A G+ + + +D +DW+A+ + Q+ + SLF 
Sbjct: 3297 I^PLYWSGDEPLLVTEAADAIRAAARAAGYTDRTSMVMDARSDWSAVAAATQSVSLFGD 3118 

Query: 77 RQTLLLLLPENGPNAAINEQXXXXXXXXXXXXXX IVRGNKLSKAQENAAWFTALAN 132 

R+ L L +P P + E +V +L KA + W LA 

Sbjct: 3117 RRLLELKIPTGKPGKSGGEMLARLADQARDQADADTLVWALPRLDKATRESKWAQXLAR 2938 

Query: 133 RSVQVTCQTPEQAQLPRWVAARAKQLNLELDDAANQVLCYCYEGNLLALAQALERLSLLW 192 

V E+ +LP W+ R + D A Q + EGN LA Q + ++L LL+ 

Sbjct: 2937 GGVMADIANVERGRLPAWIGMRLGRXXQRADTATLQWMADKVEGNXLAAHQEIQKLGLLY 27 58 

Query: 193 PDGKLTLPRVEQAVNDAAHFTPFHWVDALLMGKSKRALHILQQLRLEGSE-PVIXXXXXX 251 

P+G+L VE+AV A + F DA+L G + R + +L LR EG P+ + 
Sbjct: 2757 PEGQLXAEDVERAVLXVARYDWGLRDAMLAGDTARTVRMLXGLRAEGEALPLVLWAVGE 2578 

Query: 252 XXXXXXXXXXXQSAHTPLRALFDKHRWQNRRGI^GEALNRLSQTQLRQAVQLLTRTELT 311 

+ AL + R++ + +AL R++ AVQ + 

Sbjct: 2577 E I RLLARVAQARQQGQDAGALMRRLRI FGAHERLALQALGRVAPGAWPAAVQHAHEVDRL 2398 

Query: 312 LKQDYGQSV WAELEGLSL 329 

+K G SV W E+ L+L 

Sbjct: 2397 IK GLSVPGRLADPWEEMTRLAL 2332 



gnl | Sanger | N.mening_Contig363 Neisseria meningitidis serogroup A unfinished fragment 
genome 

Length = 76426 
Score = 115 bits (286), Expect = 2e-25 

Identities = 81/322 (25%), Positives = 137/322 (42%), Gaps = 2/322 (0%) 
Frame = +2 

Query: 10 RAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFS IDPNTDWNAI FSLCQ 69 

R + L+ Y++ G + LL E+ DA+R A QG+ + + D DWN + 

Sbjct: 62483 RIDTDAPLKPLYVIHGEEELLRIEALDALRAAAKKQGYLNREVYTADNAFDWNELLQTAG 62662 

Query: 70 AMSLFASRQTLLLLLPENGPNAAINEQXXXXXXXXXXXXXXIVRGNKLSKAQENAAWFTA 129 

+ LFA + L L +P P E +V KL K Q + WF A 

Sbjct: 62663 SAGLFADLKLLELHIPNGKPGKTGGEALQDFAARLPEDTVTLVLLPKLEKTQLQSKWFAA 62842 

Query: 130 LA1TOSVQVTCQTPEQAQLPRWVAARAKQLNLELDDAANQVLCYCYEGNLLALAQALERLS 189 
LA + + A LP+W+ R ++ L ++ A + EGNLLA Q + ++L 



Sbjct: 62843 LAAKGEVWEAKPVGAAALPQWIRGRLDKIGLGIEADALALFAERVEGNLLAARQEIDKLG 63022 



Query : 


190 


LLWPIX3-KLTLPRVEQAVNDAAHFTPFHWVDALLMGKSKRALHILQQLRLEGSEPVIXX- 


247 






LL+P G++ +AV + AFF A + G R +LLREG EPV+ 




Sbjct : 


63023 


LLYPKGHTVNIDEAQTAVANVARFDAFQLAGAWMKGDVLRVCRLLDGLREEGEEPVLLLW 


63202 


Query : 


248 


XXXXXXXXXXXXXXXQSAHTPLRALFDKHRVWQNRRGMMGEALNRLSQTQLRQAVQLLTR 


307 






+ + + + + R+W + + + + A+ R+S +L A+ + + 




Sbjct: 


63203 


AVAEDVRTLIRLAAALKQGQS IQSVRNSLRLWGDKQTLAPLAVKRI S WRLLDALKTC AQ 


63382 


Query : 


308 


TELTLKQDYGQSVWAELEGLSLLL 331 








+ +K W + L + L 




Sbjct: 


63383 


IDRI IKGAEEGDAWTVFKRLWSL 63454 





gnl |OUACGT|Ngon_Contig213 Neisseria gonorrhoeae unfinished fragment of complete genome 
Length = 41162 

Score = 109 bits (271), Expect = le-23 

Identities = 78/322 (24%), Positives = 136/322 (42%), Gaps = 2/322 (0%) 
Frame - -3 

Query: 10 RAQLNEGLRAAYLLLGNDPLLLQESQDAVRQVAAAQGFEEHHTFSIDPNTDWNAIFSLCQ 69 

R + L+ Y++ G + LL E+ DA+R A QG+ + + D + DWN + 

Sbjct: 12126 RIDTDAPLKPLWIHGEEELLRIEAVIDALRAAAKKQGYLNREAYTADASFDWNELLQTAG 11947 

Query : 7 0 AMSLFASRQTLLLLLPENGPNAAINEQXXXXXXXXXXXXXXIVRGNKLSKAQENAAWFTA 129 

LFA + L L +P P E +V KL K + + WF A 

Sbjct: 11946 NAGLFADLKLLELHIPNGKPGKNGGEALQDFAARLPEDTVTLVLLPKLEKTRLQSKWFAA 11767 

Query : 130 LANRSVQVTCQTPEQAQLPRWVAARAKQLNLELDDAANQVLCYCYEGNLLALAQALERLS 189 

LA + + A LP+W+ R ++ L ++ A + EGNLLA Q + + + L+ 

Sbjct: 11766 LAAKGEVWEAKPVGAAALPQWIRGRLDKIGLGIE7VDALALFAERVEGNLLAARQEIDKLA 11587 

Query: 190 LLWPDG-KLTLPRVEQAVlTOAAHFTPFHWVDALLMGKSKRALHILQQLRLEGSEPVIXX- 247 

LL+P G++ +AV + AFF A+ R +L L EG EPV+ 

Sbjct: 11586 LLYPKGHAVYIDEAQTAVANVARFDAFQLAGAWMKADVPRVCRLLDGLEEEGEEPVLLLW 11407 

Query: 248 XXXXXXXXXXXXXXXQSAHTPLRALFDKHRVWQNRRGMMGEALNRLSQTQLRQAVQLLTR 307 

+ +++ + R+W +++ + A+ R+S +L A++ + 
Sbjct: 11406 AVAEDVTITLIRLAAALKQGQSIQSVTWSLRL 11227 

Query: 308 TELTLKQDYGQSVWAELEGLSLLL 331 

+ +K W + L + L 

Sbjct: 11226 I DRI I KGAEDGDAWTVFKQLWSL 11155 



gnl |TIGR|D.radiodurans_8857 Deinococcus radiodurans unfinished fragment of complete gene 
Length = 22105 

Score =37.5 bits (85), Expect = 0.064 
Identities = 28/94 (29%), Positives = 42/94 (43%) 
Frame = -1 

Query: 150 WVAARAKQLNLELDDAANQVLCYCYEGNLLALAQALERLSLLWPDGKLTLPRVEQAVNDA 209 

WV RAK++ L L+ A L + +L +A L +L LL G L RV+ V 
Sbjct: 14218 WVVTRAKKMGLRLERDAASYLAEVFGADLAGIAGELNKLELL--GGALNRERVQGIVGRD 14045 

Query: 210 AHFTPFHWVDALLMGKSKRALHILQQLRLEGSEP 243 

F + A G+ A+ L++L G +P 
Sbjct: 14044 PPGDSFAMLGAATAGRPGEAVLQLRRLLGSGEDP 13943 



gnl I PAGP I Paeruginosa_Contig44 Pseudomonas aeruginosa unfinished fragment of complete ger 
Length = 203793 

Score =30.5 bits (67), Expect = 8.2 

Identities = 19/54 (35%), Positives = 25/54 (46%) 

Frame = +3 

Query: 274 DKHRWQNRRGMMGEALNRLSQTQLRQAVQLLTRTELTLKQDYGQSWAELEGL 327 

D + Q R +G L +L QTQ V LL ++ Y V+A LEGL 

Sbjct: 157899 DGEAIAQLRTDELGGLLRKLRQTQQMALVGLLRNQDVATSLGYLARVYARLEGL 158060 



gnl | Sanger_1765 |mbovis_Contig976 . 0 Mycobacterium bovis unfinished fragment of complete c 
Length = 7357 

Score = 30.5 bits (67), Expect = 8.2 

Identities = 20/62 (32%), Positives = 31/62 (49%) 

Frame = -2 

Query: 150 WVAARAKQLNLELDDAANQVLCYCYEGNLLALAQALERLSLLWPDGKLTLPRVEQAVNDA 2 09 

W A QL + +D AA QV E +L ++ LE+ + ++ D + T PRV+Q 

Sbjct: 3981 WGANAGSQLQVFVD-AAGQVPQPVIENRVLLVSDPLEQIPW*DDDQRTRPRVKQVFGRR 3805 

Query: 210 AH 211 
H 

Sbjct: 3804 QH 3799 



gnl | TIGR| gmt3661 Mycobacterium tuberculosis unfinished fragment of complete genome 
Length = 132053 

Score =30.5 bits (67), Expect =8.2 

Identities = 20/62 (32%), Positives = 31/62 (49%) 

Frame = +2 

Query: 150 WVAARAKQLNLELDDAANQVLCYCYEGNLLALAQALERLSLLWPDGKLTLPRVEQAVNDA 209 

W A QL + +D AA QV E +L ++ LE++ + + D + T PRV+Q 

Sbjct: 28898 WGANAGSQLQVFVD-AAGQVPQPVIENRVLLVSDPLEQIPW*DDDQRTRPRVKQVFGRR 29074 

Query: 210 AH 211 
H 

Sbjct: 29075 QH 29080 



emb| AL123456 |MTBH37RV Mycobacterium tuberculosis H37Rv complete genome 
Length = 4411529 

Score = 30.5 bits (67), Expect = 8.2 

Identities = 20/62 (32%) , Positives = 31/62 (49%) 

Frame = +2 

Query : 150 WVAARAKQLNLELDDAANQVLCYCYEGNLLALAQALERLSLLWPDGKLTLPRVEQAVNDA 209 

W A QL + +D AA QV E +L + + LE++ + + D + T PRV+Q 

Sbjct: 1893830 WGANAGSQLQVFVD-AAGQVPQPVIENRVLLVSDPLEQIPW*DDDQRTRPRVKQVFGRR 1894006 



Query: 210 AH 211 

H 

Sbjct: 1894007 QH 1894012 



CPU time: 0.27 user sees. 0.60 sys . sees 0.87 total sees. 

Database: Unfinished Actinobacillus actinomycetemcomitans 

Posted date: Dec 30, 1998 1:59 PM 
Number of letters in database: 1,888,023 
Number of sequences in database: 537 

Database: Complete Aquifex aeolicus 

Posted date: Aug 5, 1998 9:38 AM 
Number of letters in database: 1,551,335 
Number of sequences in database: 1 

Database: Complete Bacillus subtilis 
Posted date: Aug 5, 1998 9:38 AM 
Number of letters in database: 4,214,814 
Number of sequences in database: 1 

Database: Unfinished Bordetella pertussis 

Posted date: May 3, 1999 3:37 PM 
Number of letters in database: 3,987,145 
Number of sequences in database: 543 

Database: Borrelia burgdorferi 

Posted date: Aug 5, 1998 9:38 AM 
Number of letters in database: 1,229,458 
Number of sequences in database: 12 

Database: Unfinished Campylobacter jejuni 

Posted date: Nov 17, 1998 10:56 AM 
Number of letters in database: 1,641,480 
Number of sequences in database: 1 

Database: Complete Chlamydia trachomati 

Posted date: Aug 14, 1998 4:20 PM 
Number of letters in database: 1,042,519 
Number of sequences in database: 1 

Database: Unfinished Chlorobium tepidum 

Posted date: Feb 8, 1999 10:29 AM 
Number of letters in database: 2,257,254 
Number of sequences in database: 254 

Database: Unfinished Clostridium acetobutylicum 

Posted date: Mar 31, 1999 10:56 AM 
Number of letters in database: 3,943,874 
Number of sequences in database: 1 

Database: Unfinished Caulobacter crescentus 

Posted date: Feb 8, 1999 11:17 AM 
Number of letters- in database: 4,177,031 
Number of sequences in database: 3481 

Database: Unfinished Chlamydia trachomatis MOPN 

Posted date: Feb 8, 1999 11:21 AM 
Number of letters in database: 1,160,971 
Number of sequences in database: 624 

Database: Unfinished Deinococcus radiodurans 

Posted date: Feb 8, 1999 10:30 AM 
Number of letters in database: 3,615,037 



Number of sequences in database: 869 

Database: Complete Escherichia coli 

Posted date: Aug 5, 1998 9:37 AM 
Number of letters in database: 4,639,221 
Number of sequences in database: 1 

Database: Unfinished Enterococcus faecalis 

Posted date: Feb 8, 1999 10:30 AM 
Number of letters in database: 3,209,119 
Number of sequences in database: 293 

Database: Complete Haemophilus influenzae Rd 

Posted date: Aug 5, 1998 9:37 AM 
Number of letters in database: 1,830,138 
Number of sequences in database: 1 

Database: Complete Helicobacter pylori 26695 

Posted date: Jan 25, 1999 3:20 PM 
Number of letters in database: 1,667,867 
Number of sequences in database: 1 

Database: Complete Helicobacter pylori J99 

Posted date: Jan 25, 1999 3:55 PM 
Number of letters in database: 1,643,831 
Number of sequences in database: 1 

Database : Unfinished Mycobacterium avium 

Posted date: May 17, 1999 1:55 PM 
Number of letters in database: 5,354,737 
Number of sequences in database: 692 

Database: Unfinished Mycobacterium bovis 

Posted date: May 10, 1999 1:17 PM 
Number of letters in database: 4,093,505 
Number of sequences in database: 931 

Database: Complete Mycoplasma pneumoniae 

Posted date: Aug 5, 1998 9:37 AM 
Number of letters in database: 816,394 
Number of sequences in database: 1 

Database: Unfinished Mycobacterium tuberculosis CSU#93 

Posted date: Feb 8, 1999 10:30 AM 
Number of letters in database: 4,306,088 
Number of sequences in database: 42 

Database: Complete Mycobacterium tuberculosis H37Rv 

Posted date: Aug 14, 1998 4:20 PM 
Number of letters in database: 4,411,529 
Number of sequences in database: 1 

Database: Complete Mycoplasma genitalium 

Posted date: Aug 5, 1998 9:36 AM 
Number of letters in database: 580,073 
Number of sequences in database: 1 

Database: Unfinished Neisseria gonorrhoea 

Posted date: Dec 30, 1998 2:00 PM 
Number of letters in database: 2,172,011 
Number of sequences in database: 159 



Database: Unfinished Neisseria meningitidis MC58 

Posted date: Feb 8, 1999 10:30 AM 
Number of letters in database: 1,406,901 
Number of sequences in database: 2533 

Database: Unfinished Neisseria meningitidis serogroup A 

Posted date: May 3, 1999 3:38 PM 
Number of letters in database: 2,166,687 
Number of sequences in database: 25 

Database: Unfinished Pseudomonoas aeruginosa 

Posted date: Mar 15, 1999 3:11 PM 
Number of letters in database: 6,246,116 
Number of sequences in database: 12 

Database: Unfinished Porphyromonas gingivalis W83 

Posted date: May 17, 1999 1:55 PM 
Number of letters in database: 2,334,787 
Number of sequences in database: 12 

Database: Unfinished Pasteurella multocida PM70 

Posted date: Jun 4, 1999 9:26 AM 
Number of letters in database: 2,034,447 
Number of sequences in database: 644 

Database: Unfinished Pseudomonas putida 

Posted date: May 10, 1999 3:21 PM 
Number of letters in database: 2 01,388 
Number of sequences in database: 391 

Database: Complete Rickettsia prowazekii 

Posted date: Nov 16, 1998 3:20 PM 
Number of letters in database: 1,111,523 
Number of sequences in database: 1 

Database : Unfinished Staphylococcus aureus COL 

Posted date: May 6, 1999 2:33 PM 
Number of letters in database: 3,071,880 
Number of sequences in database: 2177 

Database: Unfinished Staphylococcus aureus 

Posted date: Dec 30, 1998 2 : 00 PM 
Number of letters in database: 733,437 
Number of sequences in database: 506 

Database: Unfinished Streptococcus mutans 

Posted date: Dec 30, 1998 2:00 PM 
Number of letters in database: 1,438,835 
Number of sequences in database: 514 

Database: Unfinished Shewanella putrefaciens 

Posted date: Feb 8, 1999 11:22 AM 
Number of letters in database: 5,974,789 
Number of sequences in database: 2430 

Database: Unfinished Streptococcus pyogenes 

Posted date: Dec 30, 1998 2:00 PM 
Number of letters in database: 1,801,145 
Number of sequences in database: 181 



Database: Unfinished Streptococcus pneumoniae 

Posted date: Feb 8, 1999 10:31 AM 
Number of letters in database: 2,114,666 
Number of sequences in database: 270 

Database: Unfinished Salmonella typhi 

Posted date: May 3, 1999 3:38 PM 
Number of letters in database: 5,088,553 
Number of sequences in database: 185 

Database: Complete Synechocystis PCC6803 

Posted date: Aug 5, 1998 9:36 AM 
Number of letters in database: 3,573,470 
Number of sequences in database: 1 

Database: Unfinished Thiobacillus ferrooxidans 

Posted date: May 10, 1999 3:22 PM 
Number of letters in database: 3,488,401 
Number of sequences in database: 2870 

Database: Unfinished Thermotoga maritima 

Posted date: Feb 8, 1999 10:31 AM 
Number of letters in database: 2,352,161 
Number of sequences in database: 948 

Database: Complete Treponema pallidum 
Posted date: Aug 14, 1998 4:21 PM 
Number of letters in database: 1,138,011 
Number of sequences in database : 1 

Database: Unfinished Vibrio cholerae 
Posted date: Feb 8, 1999 10:31 AM 
Number of letters in database: 4,145,671 
Number of sequences in database: 694 

Database: Unfinished Yersinia pestis 
Posted date: May 3, 1999 3:38 PM 
Number of letters in database: 4,937,945 
Number of sequences in database: 209 

Lambda K H 

0.321 0.134 0.00 

Gapped 

Lambda K H 

0.270 0.0470 4.94e-324 



Matrix: BLOSUM62 

Gap Penalties: Existence: 11, Extension: 1 

Number of Hits to DB: 44868843 

Number of Sequences: 537 

Number of extensions: 569311 

Number of successful extensions: 2852 

Number of sequences better than 10.0: 34 

Number of HSP ' s better than 10.0 without gapping: 15 

Number of HSP's successfully gapped in prelim test: 6 

Number of HSP's that attempted gapping in prelim test: 2763 

Number of HSP's gapped (non-prelim): 198 

length of query: 343 

length of database: 40,264,755 



V 

effective HSP length: 53 \ f 

effective length of query: 290 

effective length of database: 39042946 

effective search space: 11322454340 

effective search space used: 11322454340 

frameshift window, decay const: 50, 0.1 

T: 13 

A: 40 

XI: 16 ( 7.4 bits) 
X2: 38 (14.8 bits) 
X3: 64 (24.9 bits) 
SI: 41 (21.9 bits) 
S2: 66 (30.1 bits) 



{ 



